| Literature DB >> 23382830 |
Miriam Ragle Aure1, Israel Steinfeld, Lars Oliver Baumbusch, Knut Liestøl, Doron Lipson, Sandra Nyberg, Bjørn Naume, Kristine Kleivi Sahlberg, Vessela N Kristensen, Anne-Lise Børresen-Dale, Ole Christian Lingjærde, Zohar Yakhini.
Abstract
Genomic copy number alterations are common in cancer. Finding the genes causally implicated in oncogenesis is challenging because the gain or loss of a chromosomal region may affect a few key driver genes and many passengers. Integrative analyses have opened new vistas for addressing this issue. One approach is to identify genes with frequent copy number alterations and corresponding changes in expression. Several methods also analyse effects of transcriptional changes on known pathways. Here, we propose a method that analyses in-cis correlated genes for evidence of in-trans association to biological processes, with no bias towards processes of a particular type or function. The method aims to identify cis-regulated genes for which the expression correlation to other genes provides further evidence of a network-perturbing role in cancer. The proposed unsupervised approach involves a sequence of statistical tests to systematically narrow down the list of relevant genes, based on integrative analysis of copy number and gene expression data. A novel adjustment method handles confounding effects of co-occurring copy number aberrations, potentially a large source of false positives in such studies. Applying the method to whole-genome copy number and expression data from 100 primary breast carcinomas, 6373 genes were identified as commonly aberrant, 578 were highly in-cis correlated, and 56 were in addition associated in-trans to biological processes. Among these in-trans process associated and cis-correlated (iPAC) genes, 28% have previously been reported as breast cancer associated, and 64% as cancer associated. By combining statistical evidence from three separate subanalyses that focus respectively on copy number, gene expression and the combination of the two, the proposed method identifies several known and novel cancer driver candidates. Validation in an independent data set supports the conclusion that the method identifies genes implicated in cancer.Entities:
Mesh:
Year: 2013 PMID: 23382830 PMCID: PMC3559658 DOI: 10.1371/journal.pone.0053014
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Workflow of the proposed method to identify iPAC genes.
(1) Starting with all genes, the commonly aberrant genes are selected as those that have more than 10% gains or losses; (2) Next, those genes which in addition have an in-cis Pearson correlation above 0.6 are selected and referred to as in-cis genes; (3) Finally, statistical enrichment analysis is performed to assess in-trans functionality, leading to identification of the 56 iPAC genes.
Figure 2Copy number aberrations and in-cis correlations.
The frequency of samples with gains (red) and losses (green) is shown at the top. Each gray point shows the level of in-cis correlation between copy number and expression for a particular gene. The chromosomal positions of the genes selected in our workflow are shown at the bottom. This includes commonly aberrant genes (n = 6373; upper band), in-cis genes (n = 578; middle band), and the iPAC genes (n = 56; lower band). Colors indicate whether the gene is most frequently amplified (red) or deleted (green).
Figure 3Association between expression and copy number.
Linear regression of log-expression as a function of log-copy number for four selected iPAC genes.
Figure 4Effect of using copy number-adjusted residual expression.
(A) Comparison of in-trans correlations calculated with and without adjustment for in-cis correlation, i.e. copy number-adjusted-residual expression. In each panel, the x-axis represents the in-trans correlation without adjustment for in-cis correlation, and the y-axis represents the in-trans correlation with adjustment for in-cis correlation. The diagonal lines extend from (−1,−1) to (1, 1). Each point represents one pair of genes among all the 578×25,688 gene pairs (G, g) where G is an in-cis gene and g denotes any gene; (I) All pairs for which G and g are either on different chromosomes or on the same chromosome but on different arms; (II) All pairs for which G and g are within a distance of 30 Mb from each other; (III) All pairs for which G and g are within a distance of 5 Mb from each other; (IV) All pairs for which G and g are within a distance of 1 Mb from each other. (B) The copy number-adjusted residual expression as a function of the non-adjusted expression, in log space. Shown here are the expression levels for six genes with an in-cis correlation ranging from 0 to 0.9. Each dot represents one breast cancer patient. The effect of copy number-adjusted-residual expression increases with increasing in-cis correlation level. The dotted line is the diagonal, and the solid line is the regression line.
Figure 5Effect of residual expression.
Correlation plots showing how the level of high-level in-trans correlations change across the genome with and without copy number-adjusted residual expression correlation. Red dots signify positive in-trans Pearson correlation above 0.6, and green dots signify negative in-trans Pearson correlation below −0.6. The x-axis shows the genomic positions of all 25,688 genes and the y-axis represents the genomic position of the 578 in-cis genes. (A) High in-trans correlations between expression of in-cis genes to expression of all genes. (B) High in-trans correlations between expression of in-cis genes to residual expression of all genes. (C) High in-trans correlations between copy number of in-cis genes to the expression of all genes.
Figure 6Enrichment of the Cell Cycle Process GO term in ATAD2 correlated genes.
All genes were ranked according to the level of correlation between their copy number-adjusted-residual expression profile and the expression levels of ATAD2 (pivot for this analysis). The heatmap represents the expression levels of all 25,688 genes after ranking them according to the criteria mentioned above and after sorting the samples according to ATAD2 expression levels. Top panel in blue and red presents the expression and copy number levels of ATAD2 across the 100 samples, respectively. The graph shows the significance level in –log(hypergeometric p-value) of cell cycle process genes in the ranked list of genes. Optimal enrichment is attained at the top 189 genes, with 14 times more cell cycle process genes than would be expected by chance (mHG ).
Description and properties of the 56 iPAC genes.
| Gene | Full gene name | Cytoband | Highest associated GO term (trait) | Score | Annot |
|
|
| 1q25.1 | nucleic acid metabolic proc. | 94.31 | |
|
|
| 8q24.13 | cell cycle | 91.53 | BC |
|
|
| 3q25.33 | cell cycle | 90.42 | BC |
|
|
| 3q26.33 | cell cycle | 87.78 | C |
|
|
| 8q24.3 | cell cycle | 86.78 | BC |
|
|
| 3q26.31 | cell cycle | 82.84 | C |
|
|
| 1q24.1 | nucleic acid metabolic proc. | 82.62 | |
|
|
| 8q24.12 | cell cycle | 80.67 | C |
|
|
| 1q21.2 | nucleic acid metabolic proc. | 80.29 | |
|
|
| 8q23.1 | nucleic acid metabolic proc. | 80.27 | C |
|
|
| 8q22.1 | nucleic acid metabolic proc. | 78.77 | |
|
|
| 8q22.3 | nucleic acid metabolic proc. | 78.14 | BC |
|
|
| 17q25.1 | cell cycle | 75.28 | |
|
|
| 8q24.11 | cell cycle | 74.77 | BC |
|
|
| 17q24.2 | cell cycle proc. | 74.09 | BC |
|
|
| 8q24.13 | cell cycle proc. | 69.94 | |
|
|
| 8q22.2 | cell division | 64.40 | C |
|
|
| 8q24.13 | cellular macromol. metabolic proc. | 61.06 | BC |
|
|
| 3q26.1 | cellular macromol. metabolic proc. | 58.81 | C |
|
|
| 8q24.11 | cellular nitrogen compound metab. proc. | 55.46 | BC |
|
|
| 8q22.2 | cellular macromolecule biosynth. proc. | 46.14 | C |
|
|
| 1q23.1 | organelle organization | 38.66 | C |
|
|
| 1q42.13 | chromosome organization | 34.69 | C |
|
|
| 1q42.12 | chromosome organization | 34.10 | BC |
|
|
| 17q23.2 | positive regulation of ligase activity | 28.47 | C |
|
|
| 1q21.2 | response to DNA damage stimulus | 27.44 | |
|
|
| 1q21.2 | chromatin modification | 23.51 | C |
|
|
| 1q44 | chromatin modification | 20.52 | C |
|
|
| 8q24.3 | DNA conformation change | 17.27 | C |
|
|
| 8q24.3 | mitotic sister chromatid segregation | 16.14 | C |
|
|
| 1q21.2 | mRNA transport | 15.77 | C |
|
|
| 17q23.2 | mitotic cell cycle checkpoint | 14.29 | BC |
|
|
| 17q23.2 | mitotic cell cycle checkpoint | 14.12 | BC |
|
|
| 1q21.3 | establishment of organelle localization | 13.74 | |
|
|
| 22q12.3 | cellular protein metabolic proc. | 13.41 | |
|
|
| 20q13.32 | spindle checkpoint | 11.92 | |
|
|
| 8q24.3 | mitotic metaphase plate congression | 11.91 | C |
|
|
| 16q23.2 | spindle checkpoint | 11.28 | BC |
|
|
| 1q23.1 | DNA-dependent DNA replication init. | 11.07 | |
|
|
| 22q13.1 | neural tube development | 10.25 | BC |
|
|
| 8q24.13 | establishment of mitotic spindle loc. | 10.08 | |
|
|
| 1q42.3 | transcription | 10.03 | |
|
|
| 20q13.33 | mitotic cell cycle spindle checkpoint | 9.67 | C |
|
|
| 16q24.3 | carbohydrate catabolic proc. | 9.05 | BC |
|
|
| 8q22.3 | histone mRNA metabolic proc. | 8.19 | C |
|
|
| 11q13.2 | water-soluble vitamin biosynthetic proc. | −8.37 | |
|
|
| 8q21.13 | regeneration | −8.57 | BC |
|
|
| 1q42.3 | organ regeneration | −8.80 | C |
|
|
| 8q13.2 | programmed cell death | −9.06 | |
|
|
| 1q21.3 | activation of plasma proteins | −11.73 | |
|
|
| 8q21.11 | regulation of Rho protein signal transd. | −13.02 | BC |
|
|
| 20q13.13 | negative regulation of gene expression | −13.36 | C |
|
|
| 8q24.3 | membrane invagination | −15.15 | |
|
|
| 8q12.1 | positive regulation of cell death | −15.88 | |
|
|
| 20q13.32 | cellular protein metabolic process | −15.88 | |
|
|
| 1q23.2 | response to external stimulus | −22.09 |
Scores in the table are the negative logarithms of the enrichment scores, the sign indicating whether the association of the trait to the genes is positively or negatively correlated with the iPAC gene. The annotation column indicates genes previously linked with breast cancer (BC) and among those that are not, genes linked to cancer in general (C), based on annotation of the genes obtained with IPA (Ingenuity® Systems, www.ingenuity.com).
Figure 7Associations between iPAC genes and traits (biological processes).
A hierarchical clustered heatmap representation of traits associated with at least four iPAC genes. A red entry indicates a significant association between an iPAC gene and the corresponding trait (see Figure S4 for all the significant associations). The Expander suite [66] using average Euclidian distance was used to calculate and visualize the hierarchical clustering analysis.
Figure 8Distribution of in-cis correlation levels between copy number and expression in the MicMa and UNC cohorts.
Green bins in the histogram show distribution of in-cis correlation levels of all genes in the data set, while red bins show the distribution for only the identified iPAC genes. The left-hand y-axes in each histogram show the count in each bin among all genes, and the right-hand axes show the count for iPAC genes in each bin. (A) Distribution of the in-cis correlation levels in the MicMa cohort. (B) Distribution of the in-cis correlation levels in the UNC cohort. The iPAC genes were inferred from the MicMa cohort.
Figure 9Association consistency of iPAC genes in the validation cohort.
Blue dots represent associations between an iPAC gene and a GO term. The blue dots are plotted according to the level of association, as signed –log(p-value), in the MicMa cohort (x-axis) and in the UNC cohort (y-axis), where signed –log(p-value) refers to –log(mHG p-value) for positive associations and log(mHG p-value) for negative associations. A monotone relation is observed, supporting the iPAC behavior of the MicMa inferred iPAC genes in the validation cohort. A bar with a red dot in the center is plotted for each blue dot representing 1 standard deviation (SD) of the associations generated by associating 100 random genes from the UNC cohort to the relevant GO term.