| Literature DB >> 21098433 |
Isabel Nepomuceno-Chamorro1, Francisco Azuaje, Yvan Devaux, Petr V Nazarov, Arnaud Muller, Jesús S Aguilar-Ruiz, Daniel R Wagner.
Abstract
MOTIVATION: The application of information encoded in molecular networks for prognostic purposes is a crucial objective of systems biomedicine. This approach has not been widely investigated in the cardiovascular research area. Within this area, the prediction of clinical outcomes after suffering a heart attack would represent a significant step forward. We developed a new quantitative prediction-based method for this prognostic problem based on the discovery of clinically relevant transcriptional association networks. This method integrates regression trees and clinical class-specific networks, and can be applied to other clinical domains.Entities:
Mesh:
Year: 2010 PMID: 21098433 PMCID: PMC3018815 DOI: 10.1093/bioinformatics/btq645
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Schematic view of the proposed method. The first step involves building clinically relevant gene association networks from gene expression data of patients with the same clinical category. These networks are built based on the linear models generated by the model tree induction algorithm called M5P (Witten and Frank, 2005), an extension of regression tree algorithm. The second step involves predicting the clinical category of a new patient through the inferred networks. The prediction is based on the relative error between the true and predicted gene expression values of those genes involved in the inferred networks.
Results of the benchmark dataset: comparative analysis
| Method classifier | Our approach ( | PC-based method | Partial PC-based method | M | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| IB1 | C4.5 | NB | IB1 | C4.5 | NB | IB1 | C4.5 | NB | ||
| Number of genes | 34 | 29 | 79 | 20 | ||||||
| Representative Acc. | 87.87% | 78.78% | 87.87% | 63.63% | 84.84% | 90.9% | 87.87% | 87.87% | ||
| Weighted Avg. TPR | 0.90 | 0.87 | 0.78 | 0.87 | 0.93 | 0.63 | 0.84 | 0.90 | 0.87 | 0.87 |
| Weighted Avg. FPR | 0.08 | 0.16 | 0.24 | 0.13 | 0.06 | 0.39 | 0.15 | 0.14 | 0.13 | 0.13 |
| Specificity | 0.84 | 0.76 | 0.69 | 0.84 | 0.92 | 0.53 | 0.84 | 0.76 | 0.84 | 0.84 |
| Sensitivity | 0.95 | 0.95 | 0.85 | 0.90 | 0.95 | 0.70 | 0.85 | 1 | 0.90 | 0.90 |
| AUC | 0.89 | 0.86 | 0.70 | 0.91 | 0.937 | 0.64 | 0.92 | 0.88 | 0.94 | 0.93 |
We compared our approach to other published techniques on the basis of two tasks: network inference and classification using the inferred networks. For network inference, we applied a PC-based method (?), Partial PC-based method (De la Fuente ) and the Matisse tool (Ulitsky and Shamir, 2007). After building class-specific networks, the genes involved in these networks were used as inputs to several classifiers: nearest neighbors (IB1), decision trees (C4.5 algorithm) and Naive Bayes classifiers. Several measures as representative accuracy, TPR, FPR, specificity, sensitivity and AUC values are shown. The representative accuracy is the proportion of correctly classified patients. The TPR and FPR are the weighted average true and positive rate. The specificity is the proportion of control patients, which were recognized as control category. The sensitivity is the proportion of disease patients, which were recognized as disease category. Finally, the AUC values represent the area under the receiver operating characteristic curve. Avg., average.
Genes in common between the networks from heart dataset
| Gene name | Full name | Location | Type |
|---|---|---|---|
| BANF1 | Barrier-to-autointegration factor | N | O |
| RPS4Y1 | 40S ribosomal protein S4 | ||
| Y isoform 1 | C | O | |
| OBFC2B | SOSS complex subunit B1 | U | O |
| APOF | Apolipoprotein F | ES | T |
| CCNO | Cyclin-O | N | TR |
| LOC125595 | gene model | U | U |
| HIST1H2AE | Histone H2A type 1-B/E | N | O |
| AK091188-2203 | – | U | U |
The overlap between networks from good and bad prognosis is small and it can be observed at the node level only, i.e. there are not edges in common. ES, extracellular space; C, cytoplasm; N, nucleus; U, unknown; O, other; T, transporter; TR, transcription regulator.
Heart dataset: comparison with others classifiers
| Method classifier | PC-based method | |||
|---|---|---|---|---|
| IB1 | C4.5 | NB | ||
| Representative Acc | 65.62% | 50% | 53.12% | |
| Specificity | 0.78 | 0.68 | 0.56 | 0.50 |
| Sensitivity | 0.67 | 0.62 | 0.43 | 0.56 |
| AUC | 0.72 | 0.65 | 0.39 | 0.59 |
Comparison between the performance of SATuRNo against different classification models whose inputs represented the genes detected by PC-based method (CoExpress).
Fig. 2.Clinically relevant gene association networks obtained from the heart dataset. Both networks were built from a microarray with 15 307 genes and the forest of trees was pruned using the threshold value θ = 15. Furthermore, the representative accuracy of these prognostic transcriptional networks was 72% (LOOCV).
Topological network parameters from heart dataset
| Method | Bad prognosis | Good prognosis | ||||
|---|---|---|---|---|---|---|
| Nodes | Edges | Diameter | Nodes | Edges | Diameter | |
| PC-based method | 4297 | 16 407 | 21 | 3322 | 6228 | 29 |
| 48 | 59 | 8 | 19 | 17 | 4 | |
The networks obtained by PC-based method, with 0.95 as a threshold correlation value, have a huge number of nodes and edges in comparison with the networks obtained by our approach.