| Literature DB >> 22188670 |
Stephan Gade1, Christine Porzelius, Maria Fälth, Jan C Brase, Daniela Wuttig, Ruprecht Kuner, Harald Binder, Holger Sültmann, Tim Beissbarth.
Abstract
BACKGROUND: One of the main goals in cancer studies including high-throughput microRNA (miRNA) and mRNA data is to find and assess prognostic signatures capable of predicting clinical outcome. Both mRNA and miRNA expression changes in cancer diseases are described to reflect clinical characteristics like staging and prognosis. Furthermore, miRNA abundance can directly affect target transcripts and translation in tumor cells. Prediction models are trained to identify either mRNA or miRNA signatures for patient stratification. With the increasing number of microarray studies collecting mRNA and miRNA from the same patient cohort there is a need for statistical methods to integrate or fuse both kinds of data into one prediction model in order to find a combined signature that improves the prediction.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22188670 PMCID: PMC3471479 DOI: 10.1186/1471-2105-12-488
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Comparison of Prediction Error Curves. The figure shows the prediction error curves of CoxBoost models trained on the mRNA and miRNA data. The prediction model was trained with and without the bipartite graph describing the relations between the features. The incorporation of the graph resulted in a reduction of the prediction error. The .632 estimation of the prediction error was used in this plot, averaging over the 500 bootstrap samples. As a reference the prediction error of the Kaplan-Meier estimator is shown.
Figure 2Workflow Diagram. Workflow of the integration of miRNA and mRNA expression data.
Comparison of Boosting Results.
| M | IPEC (median) | IQR | p-value | |
|---|---|---|---|---|
| 98 | 5.90 | 0.88 | ||
| 100 | 5.82 | 0.87 | ||
| 99 | 5.79 | 0.86 | ||
| 99 | 5.46 | 1.20 | - |
The table shows the number of boosting steps M for every CoxBoost model and the IPEC (median and IQR) of 500 bootstrap runs. The number of boosting steps were determined using the whole data set. Lower IPEC scores indicate better prediction accuracy. The p-value is the result of a one-sided Wilcoxon test (unpaired) comparing the single data set prediction models and the prediction model without graph with the combination incorporating the bipartite graph.
Comparison with Other Methods.
| IPEC (median) | IQR | p-value | |
|---|---|---|---|
| 6.10 | 1.12 | ||
| 5.66 | 0.78 | ||
| 5.46 | 1.20 | - |
The table shows the comparison of Lasso and RSF with CoxBoost with the bipartite graph regarding the prediction error. As before the median and IQR from 500 IPECs were calculated. The p-value is based on a one-sided Wilcoxon test comparing the 500 IPECs of Lasso and RSF with the IPECs of CoxBoost.
Selected Features.
| No graph | With graph | ||
|---|---|---|---|
| Feature | Counts | Feature | Counts |
| ESM1 | 161 | hsa-miR-513a-3p | 329 |
| hsa-miR-412 | 151 | hsa-miR-513a-5p | 316 |
| INHBA | 130 | hsa-miR-128 | 249 |
| COMP | 126 | hsa-miR-1226* | 233 |
| ZFHX4 | 114 | hsa-miR-1231 | 209 |
| SLC6A14 | 103 | hsa-miR-1224-5p | 206 |
| hsa-miR-484 | 92 | hsa-miR-220a | 199 |
| PI15 | 83 | hsa-miR-1233 | 198 |
| hsa-miR-556-3p | 79 | hsa-miR-208a | 169 |
| hsa-miR-409-3p | 74 | hsa-miR-199b-3p | 168 |
The table lists the top ten features from CoxBoost with and without graph information. mRNA names are given by their gene symbols (capital letters) while miRNA names are given by their miRBase IDs (starting with hsa-miR). The Counts columns indicate in what number of the 500 bootstrap samples the feature was chosen. Consequently, the maximal count would be 500.