| Literature DB >> 22587526 |
Weiliang Shi1, Grace Wahba, Rafael A Irizarry, Hector Corrada Bravo, Stephen J Wright.
Abstract
BACKGROUND: In systems biology, the task of reverse engineering gene pathways from data has been limited not just by the curse of dimensionality (the interaction space is huge) but also by systematic error in the data. The gene expression barcode reduces spurious association driven by batch effects and probe effects. The binary nature of the resulting expression calls lends itself perfectly to modern regularization approaches that thrive in high-dimensional settings.Entities:
Mesh:
Year: 2012 PMID: 22587526 PMCID: PMC3505477 DOI: 10.1186/1471-2105-13-98
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Diagram of the subproblems in the first stage of pLPS, assuming 5 partitions. Side length of a square is the partition size, while the horizontal axis contains the labels of the first effect and the vertical axis the label of the second effect. Squares filled with red dots are “type-one” subproblems while the triangles filled with green dots are “type-two” subproblems.
Simulation Example 1
| | | | | |
|---|---|---|---|---|
| pLPS | 94 (100) | 99 (99,99) | 96 (97,97) | 153 |
| Logic | 100 (100) | 70 (88,91) | 65 (84,90) | 190 |
| RF | NA (100) | NA (96,97) | NA (94,96) | (517) |
| SPLR | 100 (100) | 97 (100,97) | 91 (100,98) | 712 |
n = 700 and p = 400, with no correlations. Tabulated numbers show the number of tests (out of 100) in which the pattern was detected by each algorithm. The number outside the parentheses is the number of times the given pattern was selected; the numbers inside the parentheses shows how many times the variables in the pattern are detected in the model, as a main effect or in some interaction. The final column shows the total number of times (in 100 tests) that the algorithms selected patterns (variables for RF) that are not in the true model.
Simulation Example 2
| | | | | | |
|---|---|---|---|---|---|
| pLPS | 50 (50) | 50 (50) | 48 (48,50) | 50 (50,50) | 278 |
| RF | NA (50) | NA (50) | NA (28,37) | NA (50,50) | (335) |
| SPLR | 50 (50) | 50 (50) | 50 (50,50) | 50 (50,50) | 800 |
n = 1000 and p = 8000, with correlations among neighboring variables. Tabulated numbers show the number of tests (out of 50) in which the pattern was detected by each algorithm. The number outside the parentheses is the number of times the given pattern was selected; the numbers inside the parentheses shows how many times the variables in the pattern are detected in the model, as a main effect or in some interaction. The final column shows the total number of times (in 50 tests) that the algorithms selected patterns (variables for RF) that are not in the true model.
Simulation Example 3
| | | | | | |
|---|---|---|---|---|---|
| pLPS3 | 47 (50) | 50 (50) | 47 (50,50) | 47 (50,49,48) | 204 |
| Logic | 50 (50) | 50 (50) | 34 (43,44) | 30 (50,44,41) | 151 |
| RF | NA (50) | NA (50) | NA (36,40) | NA (49,47,49) | (279) |
| SPLR | 50 (50) | 50 (50) | 45 (49,50) | 50 (50,50,50) | 554 |
n = 1000 and p = 500, with correlations among neighboring variables. Tabulated numbers show the number of tests (out of 50) in which the pattern was detected by each algorithm. The number outside the parentheses is the number of times the given pattern was selected; the numbers inside the parentheses shows how many times the variables in the pattern are detected in the model, as a main effect or in some interaction. The final column shows the total number of times (in 50 tests) that the algorithms selected patterns (variables for RF) that are not in the true model.
Simulation Example 4
| | | | | | |
|---|---|---|---|---|---|
| pLPS | 96 (100) | 98 (100,100) | 98 (98,100) | 99 (100,100) | 320 |
| RF | NA (99) | NA (96,100) | NA (87,72) | NA (94,89) | (1268) |
| SPLR | 100(100) | 97 (100,100) | 82 (100,100) | 97 (100,100) | 1017 |
n = 700 and p = 200, with positive correlations among neighbors in the first 200 variables and negative correlations among neighbors in the next 200 variables. Tabulated numbers show the number of tests (out of 100) in which the pattern was detected by each algorithm. The number outside the parentheses is the number of times the given pattern was selected; the numbers inside the parentheses shows how many times the variables in the pattern are detected in the model, as a main effect or in some interaction. The final column shows the total number of times (in 100 tests) that the algorithms selected patterns (variables for RF) that are not in the true model.
∗The average of X1, X3, X10, X201, X210, X220 and X230.
Non-Breast Cancer data: Summary of results from five-fold cross validation
| pLPS | 9.2 | 6.6 | |||
| pLPS3 | 6.4 | 3.0 | 0.945 | ||
| Logic | 14.0 | 5.0 | 24.2 | 0.956 | |
| SPLR | 17.2 | 20.6 | 5.6 | 43.4 | 0.962 |
“Total” sums the number of selected genes, the number of non-zero coefficients in the model, and the highest order of interactions. AUC indicates the area under the ROC curve.
Breast cancer survival data: Summary of results from five-fold cross validation
| pLPS | 10.0 | 6.8 | 18.8 | ||
| pLPS3 | 10.2 | 6.6 | 3.0 | 19.8 | 0.780 |
| Logic | 3.8 | 0.721 | |||
| SPLR | 19.4 | 20.6 | 5.0 | 45.0 | 0.793 |
“Total” sums the number of selected genes, the number of non-zero coefficients, and the highest order of interactions. AUC indicates the area under the ROC curve.
Summary of common genes
| | ||||||||
|---|---|---|---|---|---|---|---|---|
| pLPS | 9.2 | 2.6 | 1.6 | 2.0 | 10.0 | 4.0 | 1.0 | 5.2 |
| pLPS3 | | 8.4 | 1.6 | 2.2 | | 10.2 | 1.0 | 4.8 |
| Logic | | | 14.0 | 1.6 | | | 4.4 | 1.6 |
| SPLR | 17.2 | 19.4 | ||||||
Off-diagonal element shows the number of common genes selected by methods from the corresponding row and column. Diagonal element shows the number of genes selected by method from the corresponding row (or column). Numbers are the average of the five-fold cross validation.