| Literature DB >> 26516350 |
Hugo Gómez-Rueda1, Emmanuel Martínez-Ledesma1, Antonio Martínez-Torteya1, Rebeca Palacios-Corona2, Victor Trevino1.
Abstract
BACKGROUND: In cancer, large-scale technologies such as next-generation sequencing and microarrays have produced a wide number of genomic features such as DNA copy number alterations (CNA), mRNA expression (EXPR), microRNA expression (MIRNA), and DNA somatic mutations (MUT), among others. Several analyses of a specific type of these genomic data have generated many prognostic biomarkers in cancer. However, it is uncertain which of these data is more powerful and whether the best data-type is cancer-type dependent. Therefore, our purpose is to characterize the prognostic power of models obtained from different genomic data types, cancer types, and algorithms. For this, we compared the prognostic power using the concordance and prognostic index of models obtained from EXPR, MIRNA, CNA, MUT data and their integration for ovarian serous cystadenocarcinoma (OV), multiform glioblastoma (GBM), lung adenocarcinoma (LUAD), and breast cancer (BRCA) datasets from The Cancer Genome Atlas repository. We used three different algorithms for prognostic model selection based on constrained particle swarm optimization (CPSO), network feature selection (NFS), and least absolute shrinkage and selection operator (LASSO).Entities:
Keywords: Cancer; Genomics; Survival; TCGA
Year: 2015 PMID: 26516350 PMCID: PMC4625638 DOI: 10.1186/s13040-015-0065-1
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Fig. 1Overview of the methodology. TCGA: The Cancer Genome Atlas. BRCA: Breast Cancer. LUAD: Lung Adenocarcinoma. OV: Ovary Cystadenocarcinoma. GMB: Glioblastoma Multiform. EXPR: Gene Expression. MIRNA: micro RNAs. CNA: Copy Number Alteration. MUT: Somatic Mutations’. Stands for Filtered Data. EN: Elastic-Net LASSO (Least Absolute Shrinkage and Selection Operator). NFS: Network Feature Selector. CPSO: Constrained Particle Swarm Optimization. c-index: Concordance Index
Number of features used by the feature selection algorithms
| Before filtering | After filtering | |||||||
|---|---|---|---|---|---|---|---|---|
| OV | LUAD | BRCA | GBM | OV | LUAD | BRCA | GBM | |
| EXPR | 12,042 | 20,502 | 17,787 | 12,042 | 1,203 | 4,632 | 3,836 | 1,204 |
| MIRNA | 705 | 1,046 | 1,046 | 534 | 108 | 578 | 587 | 534a |
| CNA | 24,174 | 24,174 | 23,862 | 24,117 | 2,417 | 2,417 | 2,417 | 2,417 |
| MUT | 12,042 | 20,502 | 11,929 | 20,502 | 1,371 | 2,500 | 1,175 | 6,241 |
aNot filtered because of low number of remained filtered features
Concordance index and log-rank test of all models
| Cancer type | Algorithm | EXPR | MIRNA | CNA | MUT | MERGE |
|---|---|---|---|---|---|---|
| OV | CPSO | 66b | 61b | 64c | 10c | 65c |
| NFS | 60a | 53 | 56b | 11c | 63c | |
| LASSO | 68c | 62c | 64c | - | 68c | |
| Average | 65 | 59 | 61 | 10 | 65 | |
| LUAD | CPSO | 74b | 70 | 74b | 52c | 75b |
| NFS | 71b | 73b | 65a | 29b | 64 | |
| LASSO | 72c | 75c | 66c | 52c | 78c | |
| Average | 72 | 72 | 68 | 44 | 72 | |
| BRCA | CPSO | 85c | 82c | 92 | 38c | 83c |
| NFS | 79 | 76 | 70 | 28c | 84 | |
| LASSO | 81c | 80b | 83c | 53c | 86c | |
| Average | 82 | 80 | 82 | 40 | 84 | |
| GBM | CPSO | 63c | 59c | 57b | 16c | 59 |
| NFS | 60c | 61c | 58b | 3b | 63c | |
| LASSO | 60c | 61c | 53c | 5 | 61c | |
| Average | 61 | 61 | 56 | 8 | 61 | |
| Overall | CPSO | 72 | 68 | 72 | 29 | 71 |
| NFS | 67 | 66 | 62 | 18 | 69 | |
| LASSO | 70 | 70 | 66 | 37 | 73 | |
| Average | 70 | 68 | 67 | 27 | 71 |
a,b,cIndicate models whose Kaplan-Meier curves were statistically different at 0.05, 0.01, and 0.001 level respectively using the log-rank test. For this, the population was split by the median using the prognostic index (linear predictor of the Cox model). “-” indicates that no models were generated
Fig. 2Performance of the models generated with different genomic data sorted by the cancer subtypes. BRCA: Breast Cancer. LUAD: Lung Adenocarcinoma. OV: Ovary Cystadenocarcinoma. GMB: Glioblastoma Multiform. EXPR: Gene Expression. MIRNA: micro RNAs. CNA: Copy Number Alteration. MUT: Somatic mutations
Fig. 3Performance of the models generated with different genomic data sorted by the used algorithms. EXPR: Gene Expression. MIRNA: micro RNAs. CNA: Copy Number Alteration. MUT: Somatic mutations. LASSO (Least Absolute Shrinkage and Selection Operator). NFS: Network Feature Selector. CPSO: Constrained Particle Swarm Optimization
Feature source distribution for MERGE models
| Algorithm | Dataset | Size | EXPR | MIRNA | CNA | MUT |
|---|---|---|---|---|---|---|
| CPSO | BRCA | 10 | 6 | 0 | 3 | 1 |
| LUAD | 9 | 6 | 0 | 3 | 0 | |
| GBM | 10 | 2 | 2 | 1 | 5 | |
| OV | 10 | 6 | 0 | 4 | 0 | |
| Total | 39 | 51 % | 5 % | 28 % | 15 % | |
| NFS | BRCA | 4 | 0 | 0 | 4 | 0 |
| LUAD | 4 | 3 | 0 | 1 | 0 | |
| GBM | 9 | 4 | 0 | 5 | 0 | |
| OV | 9 | 4 | 0 | 4 | 1 | |
| Total | 26 | 42 % | 0 % | 54 % | 4 % | |
| LASSO | BRCA | 11 | 4 | 0 | 2 | 5 |
| LUAD | 9 | 3 | 3 | 1 | 2 | |
| GBM | 13 | 10 | 1 | 1 | 1 | |
| OV | 10 | 10 | 0 | 0 | 0 | |
| Total | 43 | 63 % | 9 % | 9 % | 19 % | |
| Overall | 216 | 54 % | 6 % | 27 % | 14 % | |
Percentages were rounded to closest integer
Fig. 4Agreement in the prognostic prediction by cancer type and data type. The figure shows the Cohen’s Kappa agreement of the risk assessment based on the median of the prognostic index generated by each model. Each heatmap shows the comparison of the models generated between data types and algorithms. Cells shown in squares correspond to the comparisons between the models of the three algorithms (CPSO, NFS, LASSO) for the same data type. The event proportion in each cancer type is shown in parenthesis. Within heatmaps, blue colors denote lower kappa value, white denotes intermediate values, and red denotes high kappa values. For comparison, the scatter-plots shown aside the color-coding corresponds to examples of prognostic indexes pairs having 0, 0.25, 0.5, and 1 of kappa values. MUT data did not generate risk groups in OV and GBM and were omitted. EXPR: Gene Expression. MIRNA: micro RNAs. CNA: Copy Number Alteration. MUT: Somatic mutations. LASSO (Least Absolute Shrinkage and Selection Operator). NFS: Network Feature Selector. CPSO: Constrained Particle Swarm Optimization. BRCA: Breast Cancer. LUAD: Lung Adenocarcinoma. OV: Ovary Cystadenocarcinoma. GMB: Glioblastoma Multiform