| Literature DB >> 26041968 |
Abstract
It has become increasingly common for large-scale public data repositories and clinical settings to have multiple types of data, including high-dimensional genomics, epigenomics, and proteomics data as well as survival data, measured simultaneously for the same group of biological samples, which provides unprecedented opportunities to understand cancer mechanisms from a more comprehensive scope and to develop new cancer therapies. Nevertheless, how to interpret a wealth of data into biologically and clinically meaningful information remains very challenging. In this paper, I review recent development in statistics for integrative analyses of cancer data. Topics will cover meta-analysis of homogeneous type of data across multiple studies, integrating multiple heterogeneous genomic data types, survival analysis with high-or ultrahigh-dimensional genomic profiles, and cross-data-type prediction where both predictors and responses are high-or ultrahigh-dimensional vectors. I compare existing statistical methods and comment on potential future research problems.Entities:
Keywords: cancer genomics; high-dimensional data; integrative analysis; survival analysis; ultrahigh-dimensional data
Year: 2015 PMID: 26041968 PMCID: PMC4435444 DOI: 10.4137/CIN.S17303
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Summary of the main reviewed methods.
| NAME | INTEGRATION TYPE | CORE STATISTICAL METHOD | REFERENCE |
|---|---|---|---|
| Combat | Single data type, multiple studies | Empirical Bayes | |
| SVA | Single data type, multiple studies | Surrogate variable analysis | |
| svaseq | Single data type, multiple studies | Surrogate variable analysis | |
| RUV | Single data type, multiple studies | Generalized linear model | |
| Consistent DE | Single data type, multiple studies | Bayesian hierarchical model | |
| EBarrays | Single data type, multiple studies | Bayesian hierarchical model | |
| XDE | Single data type, multiple studies | Bayesian hierarchical model | |
| Cormotif | Single data type, multiple studies | Bayesian hierarchical model | |
| 2-Norm group bridge | Single data type, multiple studies | Penalized method | |
| iCluster | Multiple data types, single study | Matrix factorization | |
| Joint Bayesian factor | Multiple data types, single study | Matrix factorization | |
| JIVE | Multiple data types, single study | Matrix factorization | |
| md-module | Multiple data types, single study | Matrix factorization | |
| MDI | Multiple data types, single study | Bayesian hierarchical model | |
| Prob_GBM | Multiple data types, single study | Bayesian hierarchical model | |
| Consensus clustering | Multiple data types, single study | Bayesian hierarchical model | |
| SNF | Multiple data types, single study | Network fusion | |
| Multi-attribute graph | Multiple data types, single study | Network fusion | |
| Penalized survival | Single data type with survival | Penalized method | |
| Network penalized survival | Single data type with survival | Penalized method | |
| SIS survival | Single data type with survival | Sure independence screening | |
| PSIS | Single data type with survival | Sure independence screening | |
| FAST | Single data type with survival | Sure independence screening | |
| Bagging survival trees | Single data type with survival | Bootstrap | |
| Survival ensembles | Single data type with survival | Inverse probability weighting | |
| RIST | Single data type with survival | Imputation | |
| T_SVD | Multiple data types multiple studies | Neural network |