Literature DB >> 20031972

DR-Integrator: a new analytic tool for integrating DNA copy number and gene expression data.

Keyan Salari1, Robert Tibshirani, Jonathan R Pollack.   

Abstract

SUMMARY: DNA copy number alterations (CNA) frequently underlie gene expression changes by increasing or decreasing gene dosage. However, only a subset of genes with altered dosage exhibit concordant changes in gene expression. This subset is likely to be enriched for oncogenes and tumor suppressor genes, and can be identified by integrating these two layers of genome-scale data. We introduce DNA/RNA-Integrator (DR-Integrator), a statistical software tool to perform integrative analyses on paired DNA copy number and gene expression data. DR-Integrator identifies genes with significant correlations between DNA copy number and gene expression, and implements a supervised analysis that captures genes with significant alterations in both DNA copy number and gene expression between two sample classes. AVAILABILITY: DR-Integrator is freely available for non-commercial use from the Pollack Lab at http://pollacklab.stanford.edu/ and can be downloaded as a plug-in application to Microsoft Excel and as a package for the R statistical computing environment. The R package is available under the name 'DRI' at http://cran.r-project.org/. An example analysis using DR-Integrator is included as supplemental material. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Year:  2009        PMID: 20031972      PMCID: PMC2815664          DOI: 10.1093/bioinformatics/btp702

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

DNA microarray technology has been leveraged to make genome-scale measurements across multiple layers of cellular molecules, e.g. gene expression (Schena et al., 1995), DNA copy number (Pinkel et al., 1998; Pollack et al., 1999), protein expression (Haab et al., 2001) and microRNA expression (Calin et al., 2004), among others. While each data type alone provides a unique snapshot of a cell's state, an integrative analysis of two or more complementary data types can reveal much more than the sum of its parts. DNA copy number alterations (CNAs) represent one data layer extensively measured among many tumor types using array-based comparative genomic hybridization (array CGH). CNAs lead to the amplification and deletion of oncogenes and tumor-suppressor genes (TSGs), respectively, and thereby play a critical role in tumorigenesis. While delineating CNAs across many samples facilitates the identification of oncogenes (in regions of recurrent amplification) and TSGs (in regions of recurrent deletion), cumulatively such genetic changes often span a substantial proportion of the genome, thereby obfuscating the distinction between ‘driver’ cancer genes selected for by a genetic event and nearby ‘passenger’ genes incidentally co-amplified or deleted. Similarly, when comparing cancer cells to normal cells, thousands of genes are often differentially expressed, rendering discrimination of the most salient, primary changes from correlated, downstream changes difficult. One useful approach to aid cancer gene discovery is to integrate DNA copy number and gene expression profiles (Adler et al., 2006; Garraway et al., 2005; Hyman et al., 2002; Pollack et al., 2002). Tumors often harbor CNAs altering the gene dosage of hundreds or thousands of genes. However, due to tissue-specific expression or feedback regulation, among other mechanisms, expression levels of many of these genes may remain unaltered. Because the effects of CNAs are mediated by changes in gene expression, the subset of genes exhibiting concordant changes in both DNA copy number and gene expression (e.g. amplified and over-expressed genes) are likely to be enriched for candidate oncogenes and TSGs. While several software tools and statistical methods have been developed to analyze DNA copy number data (Beroukhim et al., 2007; Olshen et al., 2004; Tibshirani and Wang, 2008) or gene expression data (Reich et al., 2006; Subramanian et al., 2005; Tusher et al., 2001) separately, few methods have been developed for their integration (Berger et al., 2006; Carrasco et al., 2006; Hautaniemi et al., 2004). In particular, to our knowledge there is no widely available software tool that facilitates multiple integrative analyses with a user-friendly interface. Here, we describe our development of DR-Integrator, a broadly useful package of tools to integrate array CGH and gene expression microarray data for the nomination of candidate cancer genes.

2 FEATURES

The DR-Integrator software package contains two analysis tools: DR-Correlate and DR-SAM.

2.1 Correlation analysis

DR-Correlate aims to identify genes with expression changes explained by underlying CNAs. To that end, this tool performs an analysis to identify all genes with statistically significant correlations between their DNA copy number and gene expression levels. Three options for the statistic to measure correlation are implemented: (i) Pearson's correlation; (ii) Spearman's rank correlation; and (iii) an ‘extremes’ t-test. For Pearson's and Spearman's correlations, the respective correlation coefficient is computed for each gene. For the extremes t-test, a modified Student's t-test (Tusher et al., 2001) is computed for each gene, comparing gene expression levels of samples comprising the lowest and the highest quantiles with respect to DNA copy number. In other words, for each gene the samples are rank-ordered by DNA copy number and samples below the lowest quantile and above the highest quantile form two groups whose gene expression is compared with a modified t-test. The percentile cutoff defining the two quantile groups is user-adjustable.

2.2 Two-class supervised learning analysis

DNA/RNA-Significance Analysis of Microarrays (DR-SAM) performs a supervised analysis to identify genes with statistically significant differences in both DNA copy number and gene expression between different classes (e.g. tumor subtype-A versus tumor subtype-B). The goal of this analysis is to identify genetic differences (CNAs) that mediate gene expression differences between two groups of interest. DR-SAM implements a modified Student's t-test to generate for each gene two t-scores assessing differences in DNA copy number (tDNA) and differences in gene expression (tRNA). A final score (S) is computed by first summing the copy number t-score and gene expression t-score, and then weighting the sum by the ratio of the two t-scores (0 ≤ w ≤ 1). The weight is applied to favor genes with strong differences in both DNA copy number and gene expression between the two classes. That is, a gene with statistically equal differences in copy number and in gene expression (i.e. tDNA = tRNA) will have a weight of 1, while genes with unbalanced contributions from copy number and expression will have a weight less than 1, resulting in a lower score:

2.3 False discovery rate estimation

To account for multiple hypothesis testing, both DR-Correlate and DR-SAM calculate a measure of statistical significance called the q-value, which is based on the false discovery rate (FDR). This is achieved by randomly permuting the sample labels a large number of times (user-defined; default: 1000 times) to disrupt the correlations between the paired DNA copy number and gene expression measurements. For each random permutation of the data, a test score is computed for every gene. To calculate a gene-specific q-value, each observed score is compared to the distribution of random scores and the FDR is estimated as previously described (Storey and Tibshirani, 2003).

2.4 Additional features

DR-Integrator performs several preprocessing steps including smoothing of copy number data, calling significant copy number alterations with the Fused Lasso method (Tibshirani and Wang, 2008), and merging DNA/RNA datasets from different platforms to allow for integrative analyses. DR-Integrator also allows the user to specify the FDR cutoff for an analysis and generate DNA/RNA ‘heatmaps’ for genes achieving statistical significance. Automatic imputation of missing expression data, using the nearest neighbor algorithm, is also performed. Finally, we note that DR-Integrator is not limited to the analysis of DNA copy number and gene expression data, but can be used to integrate any paired data types where a 1-to-1 mapping between measured elements can be made. An example analysis is shown on a dataset of DNA copy number and gene expression profiles of 50 breast cancer cell lines (Supplementary Figure S1).

3 IMPLEMENTATION

DR-Integrator has been developed in R and Microsoft Visual Basic v6.5, and runs as a plug-in to Microsoft Excel under the Windows operating system (2000/XP/Vista). With the use of Windows emulators, DR-Integrator can also be run on Mac OS X, Linux and Unix-based operating systems. The statistical methods can also be applied natively in the R interpreter on any of the above platforms.
  18 in total

1.  Genome-wide analysis of DNA copy-number changes using cDNA microarrays.

Authors:  J R Pollack; C M Perou; A A Alizadeh; M B Eisen; A Pergamenschikov; C F Williams; S S Jeffrey; D Botstein; P O Brown
Journal:  Nat Genet       Date:  1999-09       Impact factor: 38.330

2.  Statistical significance for genomewide studies.

Authors:  John D Storey; Robert Tibshirani
Journal:  Proc Natl Acad Sci U S A       Date:  2003-07-25       Impact factor: 11.205

3.  Circular binary segmentation for the analysis of array-based DNA copy number data.

Authors:  Adam B Olshen; E S Venkatraman; Robert Lucito; Michael Wigler
Journal:  Biostatistics       Date:  2004-10       Impact factor: 5.899

4.  Spatial smoothing and hot spot detection for CGH data using the fused lasso.

Authors:  Robert Tibshirani; Pei Wang
Journal:  Biostatistics       Date:  2007-05-18       Impact factor: 5.899

5.  Impact of DNA amplification on gene expression patterns in breast cancer.

Authors:  Elizabeth Hyman; Päivikki Kauraniemi; Sampsa Hautaniemi; Maija Wolf; Spyro Mousses; Ester Rozenblum; Markus Ringnér; Guido Sauter; Outi Monni; Abdel Elkahloun; Olli-P Kallioniemi; Anne Kallioniemi
Journal:  Cancer Res       Date:  2002-11-01       Impact factor: 12.701

6.  Quantitative monitoring of gene expression patterns with a complementary DNA microarray.

Authors:  M Schena; D Shalon; R W Davis; P O Brown
Journal:  Science       Date:  1995-10-20       Impact factor: 47.728

7.  Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors.

Authors:  Jonathan R Pollack; Therese Sørlie; Charles M Perou; Christian A Rees; Stefanie S Jeffrey; Per E Lonning; Robert Tibshirani; David Botstein; Anne-Lise Børresen-Dale; Patrick O Brown
Journal:  Proc Natl Acad Sci U S A       Date:  2002-09-24       Impact factor: 11.205

8.  MicroRNA profiling reveals distinct signatures in B cell chronic lymphocytic leukemias.

Authors:  George Adrian Calin; Chang-Gong Liu; Cinzia Sevignani; Manuela Ferracin; Nadia Felli; Calin Dan Dumitru; Masayoshi Shimizu; Amelia Cimmino; Simona Zupo; Mariella Dono; Marie L Dell'Aquila; Hansjuerg Alder; Laura Rassenti; Thomas J Kipps; Florencia Bullrich; Massimo Negrini; Carlo M Croce
Journal:  Proc Natl Acad Sci U S A       Date:  2004-07-29       Impact factor: 11.205

9.  High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays.

Authors:  D Pinkel; R Segraves; D Sudar; S Clark; I Poole; D Kowbel; C Collins; W L Kuo; C Chen; Y Zhai; S H Dairkee; B M Ljung; J W Gray; D G Albertson
Journal:  Nat Genet       Date:  1998-10       Impact factor: 38.330

10.  Protein microarrays for highly parallel detection and quantitation of specific proteins and antibodies in complex solutions.

Authors:  B B Haab; M J Dunham; P O Brown
Journal:  Genome Biol       Date:  2001-01-22       Impact factor: 13.583

View more
  17 in total

1.  Lessons from a decade of integrating cancer copy number alterations with gene expression profiles.

Authors:  Norman Huang; Parantu K Shah; Cheng Li
Journal:  Brief Bioinform       Date:  2011-09-23       Impact factor: 11.622

2.  Deciphering the associations between gene expression and copy number alteration using a sparse double Laplacian shrinkage approach.

Authors:  Xingjie Shi; Qing Zhao; Jian Huang; Yang Xie; Shuangge Ma
Journal:  Bioinformatics       Date:  2015-09-03       Impact factor: 6.937

3.  MVisAGe Identifies Concordant and Discordant Genomic Alterations of Driver Genes in Squamous Tumors.

Authors:  Vonn Walter; Ying Du; Ludmila Danilova; Michele C Hayward; D Neil Hayes
Journal:  Cancer Res       Date:  2018-04-26       Impact factor: 12.701

4.  Integrative analysis of 1q23.3 copy-number gain in metastatic urothelial carcinoma.

Authors:  Markus Riester; Lillian Werner; Joaquim Bellmunt; Shamini Selvarajah; Elizabeth A Guancial; Barbara A Weir; Edward C Stack; Rachel S Park; Robert O'Brien; Fabio A B Schutz; Toni K Choueiri; Sabina Signoretti; Josep Lloreta; Luigi Marchionni; Enrique Gallardo; Federico Rojo; Denise I Garcia; Yvonne Chekaluk; David J Kwiatkowski; Bernard H Bochner; William C Hahn; Azra H Ligon; Justine A Barletta; Massimo Loda; David M Berman; Philip W Kantoff; Franziska Michor; Jonathan E Rosenberg
Journal:  Clin Cancer Res       Date:  2014-01-31       Impact factor: 12.531

5.  Comparative analysis of algorithms for integration of copy number and expression data.

Authors:  Riku Louhimo; Tatiana Lepikhova; Outi Monni; Sampsa Hautaniemi
Journal:  Nat Methods       Date:  2012-02-12       Impact factor: 28.547

Review 6.  Cancer gene prioritization by integrative analysis of mRNA expression and DNA copy number data: a comparative review.

Authors:  Leo Lahti; Martin Schäfer; Hans-Ulrich Klein; Silvio Bicciato; Martin Dugas
Journal:  Brief Bioinform       Date:  2012-03-22       Impact factor: 11.622

7.  canEvolve: a web portal for integrative oncogenomics.

Authors:  Mehmet Kemal Samur; Zhenyu Yan; Xujun Wang; Qingyi Cao; Nikhil C Munshi; Cheng Li; Parantu K Shah
Journal:  PLoS One       Date:  2013-02-13       Impact factor: 3.240

8.  Mesenchymal transition and PDGFRA amplification/mutation are key distinct oncogenic events in pediatric diffuse intrinsic pontine gliomas.

Authors:  Stephanie Puget; Cathy Philippe; Dorine A Bax; Bastien Job; Pascale Varlet; Marie-Pierre Junier; Felipe Andreiuolo; Dina Carvalho; Ricardo Reis; Lea Guerrini-Rousseau; Thomas Roujeau; Philippe Dessen; Catherine Richon; Vladimir Lazar; Gwenael Le Teuff; Christian Sainte-Rose; Birgit Geoerger; Gilles Vassal; Chris Jones; Jacques Grill
Journal:  PLoS One       Date:  2012-02-28       Impact factor: 3.240

9.  A network-based, integrative study to identify core biological pathways that drive breast cancer clinical subtypes.

Authors:  B Dutta; L Pusztai; Y Qi; F André; V Lazar; G Bianchini; N Ueno; R Agarwal; B Wang; C Y Shiang; G N Hortobagyi; G B Mills; W F Symmans; G Balázsi
Journal:  Br J Cancer       Date:  2012-02-16       Impact factor: 7.640

10.  ICan: an integrated co-alteration network to identify ovarian cancer-related genes.

Authors:  Yuanshuai Zhou; Yongjing Liu; Kening Li; Rui Zhang; Fujun Qiu; Ning Zhao; Yan Xu
Journal:  PLoS One       Date:  2015-03-24       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.