| Literature DB >> 22441573 |
Leo Lahti1, Martin Schäfer, Hans-Ulrich Klein, Silvio Bicciato, Martin Dugas.
Abstract
A variety of genome-wide profiling techniques are available to investigate complementary aspects of genome structure and function. Integrative analysis of heterogeneous data sources can reveal higher level interactions that cannot be detected based on individual observations. A standard integration task in cancer studies is to identify altered genomic regions that induce changes in the expression of the associated genes based on joint analysis of genome-wide gene expression and copy number profiling measurements. In this review, we highlight common approaches to genomic data integration and provide a transparent benchmarking procedure to quantitatively compare method performances in cancer gene prioritization. Algorithms, data sets and benchmarking results are available at http://intcomp.r-forge.r-project.org.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22441573 PMCID: PMC3548603 DOI: 10.1093/bib/bbs005
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Summary of the comparison algorithms
| Implementation | CN preprocessing | Methodology | Significance scoring | Reference |
|---|---|---|---|---|
| CNAmet (R) | Called | Custom statistic; | PPT; aberrant regions | [ |
| Two step | [ | |||
| DR-Correlate/t-test (BC) | Raw/segmented | Two step | PPT; | [ |
| DR-Correlate (BC) | Raw/segmented | COR | PPT; | [ |
| edira (R) | Raw/segmented | Custom statistic; | NT; | [ |
| COR | ||||
| intCNGEan (R) | cghCall object | Custom statistic; | PNT; | [ |
| Two step | ||||
| Ortiz-Estevez (R) | Raw/segmented | Two step | PNT; | [ |
| PMA (CRAN) | Raw/segmented | LV; COR | PLV; | [ |
| PREDA/SODEGIR (BC) | Raw/segmented | Custom statistic; | PPT; aberrant regions/ | [ |
| Two step | [ | |||
| pint/simcca | Raw/segmented | LV; COR | PLV; | [ |
| SIM (BC) | Raw/segmented | REG | PT; | [ |
The implementations are available through Bioconductor (BC); CRAN or R source code (R). The CN preprocessing methods required by each algorithm are listed. COR, correlation analysis; REG, regression analysis; LV, latent variables analysis; PT, parametric test; NT, nonparametric test; PNT, permutation test based on statistic of nonparametric test; PPT, permutation test based on statistic of parametric test; PLV, permutation test based on latent variable score.
Figure 1:AUC values in ROC analysis quantify cancer gene prioritization performance of the methods for the five benchmarking data sets. High values indicate high true-positive versus false-positive ratio among the top findings; the dashed line indicates the expected AUC value for a random gene list (AUC = 0.5). The methods have been ordered by their median rank across all data sets. For the ROC curves, see Supplementary Figure S1.