| Literature DB >> 21687588 |
Sook S Ha1, Inyoung Kim, Yue Wang, Jianhua Xuan.
Abstract
Conventionally, pathway-based analysis assumes that genes in a pathway equally contribute to a biological function, thus assigning uniform weight to genes. However, this assumption has been proved incorrect, and applying uniform weight in the pathway analysis may not be an appropriate approach for the tasks like molecular classification of diseases, as genes in a functional group may have different predicting power. Hence, we propose to use different weights to genes in pathway-based analysis and devise four weighting schemes. We applied them in two existing pathway analysis methods using both real and simulated gene expression data for pathways. Among all schemes, random weighting scheme, which generates random weights and selects optimal weights minimizing an objective function, performs best in terms of P value or error rate reduction. Weighting changes pathway scoring and brings up some new significant pathways, leading to the detection of disease-related genes that are missed under uniform weight.Entities:
Year: 2011 PMID: 21687588 PMCID: PMC3114410 DOI: 10.1155/2011/463645
Source DB: PubMed Journal: Comp Funct Genomics ISSN: 1531-6912
Figure 1Average P-value of all Type II diabetes pathways versus number of iterations for RWM in the global test.
A brief summary of the four proposed weighting schemes. The algorithms are expressed for the pathway of m genes and n samples. absT and Qdiff algorithms calculate the weight for the j th gene in a pathway, and RWV selects an optimal random weight vector w minimizing P-value or OOB error rate, and RWM selects an optimal weight matrix w minimizing P-value or OOB error rate.
| Name | Algorithm | Notes |
|---|---|---|
|
|
| Based on two-sample |
|
|
| Based on the global test statistic |
|
|
|
|
|
|
|
|
Top 20 type II diabetes pathways selected in the global test under each weighting scheme (PID stands for pathway identification number).
| Weighting schemes | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
| ||||||
| Ranks | PID |
| PID |
| PID |
| PID |
| PID |
|
| 1 |
| .0098 | 40 | .0002 |
| .0004 |
| .0003 | 195 | .0000 |
| 2 | 264 | .0114 |
| .0003 | 57 | .0008 |
| .0006 | 172 | .0000 |
| 3 |
| .0218 | 13 | .0003 | 41 | .0015 | 139 | .0009 | 127 | .0001 |
| 4 |
| .0331 | 37 | .0004 |
| .0017 |
| .0011 | 2 | .0001 |
| 5 |
| .0431 | 57 | .0005 | 264 | .0018 | 264 | .0015 | 76 | .0001 |
| 6 | 168 | .0474 | 44 | .0006 |
| .0018 |
| .0015 | 235 | .0001 |
| 7 | 73 | .0503 |
| .0006 | 20 | .0018 | 157 | .0022 | 199 | .0001 |
| 8 | 139 | .0509 | 278 | .0006 | 157 | .0018 | 20 | .0024 | 198 | .0001 |
| 9 | 204 | .0555 | 66 | .0007 | 193 | .0018 |
| .0028 | 261 | .0001 |
| 10 |
| .0618 | 56 | .0007 | 26 | .0019 | 203 | .0035 | 277 | .0001 |
| 11 | 162 | .0625 | 51 | .0007 |
| .0027 | 26 | .0048 |
| .0001 |
| 12 | 203 | .0746 | 109 | .0007 | 232 | .0035 | 17 | .0049 | 80 | .0001 |
| 13 | 229 | .0784 | 139 | .0008 | 37 | .0036 | 193 | .0064 | 263 | .0001 |
| 14 | 201 | .0817 | 104 | .0009 | 58 | .0040 | 73 | .0066 | 19 | .0001 |
| 15 | 120 | .0823 |
| .0010 | 59 | .0040 | 208 | .0069 | 165 | .0001 |
| 16 | 76 | .0831 | 217 | .0010 | 60 | .0040 | 8 | .0073 | 42 | .0001 |
| 17 | 128 | .0847 | 110 | .0010 | 61 | .0040 | 79 | .0077 | 144 | .0001 |
| 18 | 274 | .0937 | 43 | .0010 | 62 | .0040 | 16 | .0078 | 162 | .0001 |
| 19 | 247 | .0968 | 36 | .0011 | 63 | .0040 | 252 | .0090 | 258 | .0001 |
| 20 | 22 | .1015 |
| .0011 | 139 | .0049 | 173 | .0093 | 1 | .0001 |
|
| ||||||||||
| Total | 1.2244 | .0142 | .0540 | .0875 | .0018 | |||||
| Average | .0612 | .0007 | .0027 | .0044 | .0001 | |||||
Figure 2P-value distributions for the top 20 Type II diabetes pathways selected by each weighting scheme in the global test.
Top 20 canine pathways selected in the global test under each weighting scheme (PID stands for pathway identification number).
| Weighting schemes | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
| ||||||
| Ranks | PID |
| PID |
| PID |
| PID |
| PID |
|
| 1 |
| .00003 |
| .00000 | 368 | .00003 |
| .00000 |
| .00000 |
| 2 |
| .00004 |
| .00000 |
| .00004 |
| .00000 |
| .00000 |
| 3 | 394 | .00004 | 326 | .00001 |
| .00006 | 368 | .00001 | 223 | .00001 |
| 4 |
| .00007 | 360 | .00001 | 394 | .00007 |
| .00001 | 160 | .00001 |
| 5 | 304 | .00007 | 73 | .00001 | 73 | .00007 |
| .00001 | 295 | .00001 |
| 6 | 183 | .00008 |
| .00002 | 304 | .00007 | 202 | .00001 | 304 | .00001 |
| 7 |
| .00009 |
| .00002 | 247 | .00007 |
| .00001 | 283 | .00001 |
| 8 | 440 | .00009 | 156 | .00002 |
| .00008 | 175 | .00001 |
| .00001 |
| 9 | 159 | .00011 | 133 | .00002 | 440 | .00008 | 210 | .00002 | 387 | .00001 |
| 10 |
| .00015 |
| .00002 |
| .00010 | 247 | .00002 | 64 | .00001 |
| 11 |
| .00017 | 94 | .00002 | 183 | .00010 | 394 | .00002 | 421 | .00001 |
| 12 | 45 | .00017 | 375 | .00002 | 157 | .00010 | 239 | .00002 | 135 | .00002 |
| 13 |
| .00017 |
| .00002 | 64 | .00011 | 45 | .00002 | 129 | .00002 |
| 14 | 368 | .00018 | 261 | .00002 | 159 | .00015 | 100 | .00002 | 374 | .00002 |
| 15 | 192 | .00018 | 192 | .00002 | 45 | .00018 | 310 | .00002 | 165 | .00002 |
| 16 | 261 | .00019 | 154 | .00002 | 261 | .00020 | 326 | .00002 | 183 | .00002 |
| 17 | 87 | .00025 | 157 | .00003 |
| .00020 | 304 | .00003 | 265 | .00002 |
| 18 | 422 | .00028 | 420 | .00003 |
| .00020 | 281 | .00003 | 20 | .00002 |
| 19 | 223 | .00032 | 320 | .00003 |
| .00020 | 360 | .00003 | 397 | .00002 |
| 20 | 354 | .00039 | 422 | .00003 | 192 | .00025 | 336 | .00003 |
| .00002 |
|
| ||||||||||
| Total | .00307 | .00037 | .00236 | .00034 | .00027 | |||||
| Average | .00015 | .00002 | .00012 | .00002 | .00001 | |||||
Figure 3P-value distributions for the canine pathways selected by each weighting scheme in the global test.
Five new significant type II diabetes pathways selected in the global test under absT scheme: P values and ranks of them are compared to those under uniform weight.
| Pathway name |
| Ranks | ||
|---|---|---|---|---|
|
|
|
|
| |
| (PID = 13) Apoptosis | .3658 | .0003 | 111 | 2 |
|
| ||||
| (PID = 66) Cell cycle | .4871 | .0007 | 155 | 9 |
|
| ||||
| (PID = 51) c3_U133_probes | .3822 | .0007 | 116 | 9 |
|
| ||||
| (PID = 109) Integrin-mediated cell adhesion | .3876 | .0007 | 119 | 9 |
|
| ||||
| (PID = 43) c22_U133_probes | .3599 | .0010 | 107 | 15 |
Seven new significant canine pathways selected in the global test under absT scheme: P values and ranks of them are compared to those under uniform weight.
| Pathway name |
| Ranks | ||
|---|---|---|---|---|
|
|
|
|
| |
| (PID = 133) Activation of Csk by cAMP-dependent protein kinase inhibits signaling through the T cell receptor | .1006 | .00002 | 258 | 6 |
|
| ||||
| (PID = 156) Steps in the glycosylation of mammalian N-linked oligosaccarides | .1308 | .00002 | 278 | 6 |
|
| ||||
| (PID = 375) PTEN-dependent cell cycle arrest and apoptosis | .4451 | .00002 | 371 | 6 |
|
| ||||
| (PID = 154) TPO signaling pathway | .5504 | .00002 | 389 | 6 |
|
| ||||
| (PID = 420) Trefoil factors initiate mucosal healing | .2226 | .00003 | 330 | 17 |
|
| ||||
| (PID = 320) CDK regulation of DNA replication | .5563 | .00003 | 390 | 17 |
Prediction rates of top 20 type II diabetes pathways selected in the global test under each weighting scheme.
| Prediction methods |
|
|
|
|
|
|---|---|---|---|---|---|
| LDA | 0.57 | 0.58 | 0.53 | 0.55 | 0.81 |
| SVML | 0.61 | 0.59 | 0.58 | 0.61 | 0.79 |
| SVMP | 0.51 | 0.53 | 0.51 | 0.53 | 0.74 |
| KNN | 0.55 | 0.68 | 0.53 | 0.59 | 0.76 |
Prediction rates of top 20 canine pathways selected in the global test under each weighting scheme.
| Prediction methods |
|
|
|
|
|
|---|---|---|---|---|---|
| LDA | 0.84 | 0.82 | 0.86 | 0.86 | 0.86 |
| SVML | 0.86 | 0.86 | 0.86 | 0.86 | 0.86 |
| SVMP | 0.71 | 0.70 | 0.72 | 0.72 | 0.70 |
| KNN | 0.84 | 0.83 | 0.83 | 0.84 | 0.87 |
Top 33 type II diabetes pathways selected in the random forests under uniform weight and RWM scheme.
| Index |
|
| ||||||
|---|---|---|---|---|---|---|---|---|
| Rank |
| No. of genes | OOB (%) | Rank | PID | No. of genes | OOB (%) | |
| 1 | 1 | 79 | 33 | 0.26 | 1 |
| 22 | 0.11 |
| 2 | 2 |
| 18 | 0.29 | 2 | 113 | 26 | 0.14 |
| 3 | 2 |
| 22 | 0.29 | 2 | 192 | 1 | 0.14 |
| 4 | 4 | 36 | 116 | 0.31 | 4 | 106 | 5 | 0.17 |
| 5 | 4 | 124 | 4 | 0.31 | 5 | 117 | 26 | 0.17 |
| 6 | 4 | 230 | 121 | 0.31 | 4 |
| 7 | 0.17 |
| 7 | 7 |
| 6 | 0.34 | 4 | 163 | 8 | 0.17 |
| 8 | 7 | 16 | 49 | 0.34 | 4 | 164 | 26 | 0.17 |
| 9 | 7 | 32 | 157 | 0.34 | 4 | 176 | 3 | 0.17 |
| 10 | 7 | 46 | 36 | 0.34 | 4 | 197 | 11 | 0.17 |
| 11 | 7 | 51 | 185 | 0.34 | 4 | 235 | 3 | 0.17 |
| 12 | 7 | 109 | 91 | 0.34 | 4 | 244 | 46 | 0.17 |
| 13 | 7 | 141 | 4 | 0.34 | 4 | 245 | 11 | 0.17 |
| 14 | 7 | 229 | 133 | 0.34 | 4 | 250 | 12 | 0.17 |
| 15 | 7 | 267 | 4 | 0.34 | 4 | 251 | 1 | 0.17 |
| 16 | 16 |
| 2 | 0.37 | 4 | 254 | 25 | 0.17 |
| 17 | 16 | 6 | 3 | 0.37 | 4 | 274 | 16 | 0.17 |
| 18 | 16 |
| 15 | 0.37 | 4 | 275 | 9 | 0.17 |
| 19 | 16 | 13 | 92 | 0.37 | 19 |
| 2 | 0.20 |
| 20 | 16 | 37 | 235 | 0.37 | 19 |
| 18 | 0.20 |
| 21 | 16 | 40 | 240 | 0.37 | 19 |
| 6 | 0.20 |
| 22 | 16 | 49 | 188 | 0.37 | 19 |
| 15 | 0.20 |
| 23 | 16 | 59 | 194 | 0.37 | 19 | 24 | 14 | 0.20 |
| 24 | 16 | 76 | 3 | 0.37 | 19 | 42 | 2 | 0.20 |
| 25 | 16 |
| 7 | 0.37 | 19 | 52 | 122 | 0.20 |
| 26 | 16 | 162 | 2 | 0.37 | 19 | 69 | 20 | 0.20 |
| 27 | 16 | 173 | 11 | 0.37 | 19 | 74 | 1 | 0.20 |
| 28 | 16 | 194 | 13 | 0.37 | 19 | 78 | 6 | 0.20 |
| 29 | 16 | 201 | 19 | 0.37 | 19 | 80 | 5 | 0.20 |
| 30 | 16 | 207 | 3 | 0.37 | 19 | 85 | 8 | 0.20 |
| 31 | 16 | 209 | 21 | 0.37 | 19 | 86 | 39 | 0.20 |
| 32 | 16 | 227 | 13 | 0.37 | 19 | 98 | 71 | 0.20 |
| 33 | 16 | 228 | 43 | 0.37 | 19 | 99 | 13 | 0.20 |
|
| ||||||||
| Total | 2050 | 11.29 | 578 | 5.86 | ||||
| Average | 64 | 0.35 | 18 | 0.18 | ||||
Top 33 canine pathways selected in the random forests under uniform weight and RWM scheme.
| Index |
|
| ||||||
|---|---|---|---|---|---|---|---|---|
| Rank | PID | No. of genes | OOB (%) | Rank | PID | No. of genes | OOB (%) | |
| 1 | 1 |
| 4 | 0.03 | 1 |
| 9 | 0.00 |
| 2 | 1 |
| 6 | 0.03 | 2 |
| 8 | 0.03 |
| 3 | 1 |
| 15 | 0.03 | 2 | 45 | 40 | 0.03 |
| 4 | 1 |
| 14 | 0.03 | 2 | 182 | 5 | 0.03 |
| 5 | 5 |
| 8 | 0.07 | 2 | 220 | 7 | 0.03 |
| 6 | 5 |
| 5 | 0.07 | 2 |
| 4 | 0.03 |
| 7 | 5 |
| 18 | 0.07 | 2 |
| 13 | 0.03 |
| 8 | 5 |
| 8 | 0.07 | 2 |
| 6 | 0.03 |
| 9 | 5 |
| 9 | 0.07 | 2 |
| 19 | 0.03 |
| 10 | 5 | 287 | 15 | 0.07 | 2 |
| 15 | 0.03 |
| 11 | 5 |
| 14 | 0.07 | 2 |
| 14 | 0.03 |
| 12 | 5 | 339 | 19 | 0.07 | 2 | 440 | 59 | 0.03 |
| 13 | 5 | 349 | 68 | 0.07 | 13 | 24 | 7 | 0.07 |
| 14 | 5 |
| 19 | 0.07 | 13 | 40 | 4 | 0.07 |
| 15 | 15 |
| 9 | 0.10 | 13 | 59 | 14 | 0.07 |
| 16 | 15 | 89 | 22 | 0.10 | 13 |
| 5 | 0.07 |
| 17 | 15 |
| 10 | 0.10 | 13 |
| 18 | 0.07 |
| 18 | 15 | 129 | 7 | 0.10 | 13 |
| 10 | 0.07 |
| 19 | 15 | 147 | 11 | 0.10 | 13 |
| 8 | 0.07 |
| 20 | 15 |
| 4 | 0.10 | 13 | 154 | 16 | 0.07 |
| 21 | 15 | 171 | 12 | 0.10 | 13 |
| 4 | 0.07 |
| 22 | 15 | 173 | 46 | 0.10 | 13 | 162 | 18 | 0.07 |
| 23 | 15 | 175 | 34 | 0.10 | 13 |
| 27 | 0.07 |
| 24 | 15 |
| 27 | 0.10 | 13 | 204 | 10 | 0.07 |
| 25 | 15 |
| 4 | 0.10 | 13 |
| 4 | 0.07 |
| 26 | 15 | 230 | 3 | 0.10 | 13 | 229 | 6 | 0.07 |
| 27 | 15 |
| 17 | 0.10 | 13 | 234 | 8 | 0.07 |
| 28 | 15 | 281 | 32 | 0.10 | 13 |
| 9 | 0.07 |
| 29 | 15 |
| 13 | 0.10 | 13 | 264 | 15 | 0.07 |
| 30 | 15 |
| 11 | 0.10 | 13 | 269 | 7 | 0.07 |
| 31 | 15 | 380 | 25 | 0.10 | 13 |
| 17 | 0.07 |
| 32 | 15 | 391 | 7 | 0.10 | 13 |
| 11 | 0.07 |
| 33 | 15 | 436 | 6 | 0.10 | 13 |
| 14 | 0.07 |
|
| ||||||||
| Total | 522 | 2.79 | 431 | 1.83 | ||||
| Average | 16 | 0.08 | 13 | 0.06 | ||||
Random forests results for simulated datasets under uniform weight and RWM scheme: (a) simulation case 1 uses covariance structure and mean of Pathway ID 164 from type II diabetes dataset, (b) simulation case 2 uses covariance structure and mean of Pathway ID 441 from canine dataset.
| No. of samples | (a) Simulation Case 1 | (b) Simulation Case 2 | ||||
|---|---|---|---|---|---|---|
| No. of | OOB (%) | No. of genes | OOB (%) | |||
|
|
|
|
| |||
| 30 | 26 | 0.27 | 0.13 | 21 | 0.50 | 0.33 |
| 50 | 26 | 0.48 | 0.36 | 21 | 0.30 | 0.20 |
| 100 | 26 | 0.30 | 0.22 | 21 | 0.24 | 0.24 |
Type II diabetes pathways whose ranks are significantly changed under RWM in the random forests.
| Pathway ID and name | Ranks | OOB (%) | ||
|---|---|---|---|---|
|
|
|
|
| |
| PID 113 Limonene and pinene degradation | 184 | 2 | 0.54 | 0.14 |
| PID 106 Inositol metabolism | 259 | 4 | 0.69 | 0.17 |
| PID 164 MAP00480_Glutathione_metabolism(user defined) | 242 | 4 | 0.63 | 0.17 |
| PID 176 MAP00550_Peptidoglycan_biosynthesis(user defined) | 259 | 4 | 0.66 | 0.17 |
| PID 235 Peptidoglycan biosynthesis | 259 | 4 | 0.66 | 0.17 |
Canine pathways whose ranks are significantly changed under RWM in the random forests.
| Pathway ID and name | Ranks | OOB (%) | ||
|---|---|---|---|---|
|
|
|
|
| |
| PID 24 Alanine and aspartate metabolism | 319 | 13 | 0.34 | 0.07 |
|
| ||||
| PID 59 Glycerolipid metabolism | 242 | 13 | 0.28 | 0.07 |
|
| ||||
| PID 204 Role of PI3K subunit p85 in regulation of actin organization and cell migration | 281 | 13 | 0.31 | 0.07 |
|
| ||||
| PID 229 Induction of apoptosis through DR3 and DR4/5 death receptors | 157 | 4 | 0.21 | 0.07 |
|
| ||||
| PID 269 Ghrelin: regulation of food intake and energy homeostasis | 188 | 4 | 0.28 | 0.07 |
Prediction rates of top 33 canine pathways selected in the random forest under uniform weight and RWM scheme.
| Prediction methods | Type II diabetes dataset | Canine dataset | ||
|---|---|---|---|---|
|
|
|
|
| |
| LDA | 0.54 | 0.63 | 0.87 | 0.78 |
| SVML | 0.53 | 0.64 | 0.88 | 0.84 |
| SVMP | 0.46 | 0.54 | 0.76 | 0.72 |
| KNN | 0.54 | 0.59 | 0.80 | 0.73 |
Eleven genes associated with type II diabetes in the top 20 pathways selected under absT scheme.
| Gene symbols | Gene names |
|---|---|
| CD36 | cd36 antigen (collagen type i receptor, thrombospondin receptor) |
| CAS | Caspase 9, apoptosis-related cysteine peptidase |
| GPX3 | Glutathione peroxidase 3 (plasma) |
| GSTT1 | Glutathione s-transferase theta 1 |
| SOD1 | Superoxide dismutase 1, soluble (amyotrophic lateral sclerosis 1 (adult)) |
| TPMT | Thiopurine s-methyltransferase |
| GSTM1 | Glutathione s-transferase m1 |
| CYP2E1 | Cytochrome p450, family 2, subfamily e, polypeptide 1 |
| LPL | Lipoprotein lipase |
| TNF | Lipoprotein lipase |
| GYS | Tumor necrosis factor (tnf superfamily, member 2) |
(a) Simulation Case 1.
| No. of samples | No. of genes | No. of tested | Statistic | Expected | sd-of- |
| |||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| ||||
| 30 | 26 | 26 | 13.51 | 27.68 | 10 | 7.64 | 3.96 | 0.2246 | 0.0014 |
| 50 | 26 | 26 | 13.42 | 27.49 | 10 | 6.59 | 3.61 | 0.2155 | 0.0007 |
| 100 | 26 | 26 | 12.33 | 27.44 | 10 | 6.60 | 3.25 | 0.2573 | 0.0002 |
(b) Simulation Case 2.
| No. of samples | No. of genes | No. of tested | Statistic | Expected | sd-of- |
| |||
|
|
|
|
|
|
| ||||
|
| |||||||||
| 30 | 21 | 21 | 34.60 | 46.51 | 10 | 9.46 | 5.57 | 0.0289 | 0.0002 |
| 50 | 21 | 21 | 66.41 | 56.91 | 10 | 7.94 | 5.87 | 0.0004 | 0.0001 |
| 100 | 21 | 21 | 97.11 | 70.69 | 10 | 8.03 | 5.42 | 0.0000 | 0.0000 |