| Literature DB >> 28719033 |
Wenbin Zhou1, Zhangxiang Zhao1,2, Ruiping Wang1, Yue Han1, Chengyu Wang1,2, Fan Yang1,2, Ya Han1,2, Haihai Liang3, Lishuang Qi1, Chenguang Wang1, Zheng Guo1,4,5, Yunyan Gu1,2.
Abstract
Results from numerous studies suggest an important role for somatic copy number alterations (SCNAs) in cancer progression. Our work aimed to identify the drivers (oncogenes or tumor suppressor genes) that reside in recurrently aberrant genomic regions, including a large number of genes or non-coding genes, which remain a challenge for decoding the SCNAs involved in carcinogenesis. Here, we propose a new approach to comprehensively identify drivers, using 8740 cancer samples involving 18 cancer types from The Cancer Genome Atlas (TCGA). On average, 84 drivers were revealed for each cancer type, including protein-coding genes, long non-coding RNAs (lncRNA) and microRNAs (miRNAs). We demonstrated that the drivers showed significant attributes of cancer genes, and significantly overlapped with known cancer genes, including MYC, CCND1 and ERBB2 in breast cancer, and the lncRNA PVT1 in multiple cancer types. Pan-cancer analyses of drivers revealed specificity and commonality across cancer types, and the non-coding drivers showed a higher cancer-type specificity than that of coding drivers. Some cancer types from different tissue origins were found to converge to a high similarity because of the significant overlap of drivers, such as head and neck squamous cell carcinoma (HNSC) and lung squamous cell carcinoma (LUSC). The lncRNA SOX2-OT, a common driver of HNSC and LUSC, showed significant expression correlation with the oncogene SOX2. In addition, because some drivers are common in multiple cancer types and have been targeted by known drugs, we found that some drugs could be successfully repositioned, as validated by the datasets of drug response assays in cell lines. Our work reported a new method to comprehensively identify drivers in SCNAs across diverse cancer types, providing a feasible strategy for cancer drug repositioning as well as novel findings regarding cancer-associated non-coding RNA discovery.Entities:
Keywords: copy number alterations; driver; drug repositioning; lncRNAs
Mesh:
Substances:
Year: 2017 PMID: 28719033 PMCID: PMC5623819 DOI: 10.1002/1878-0261.12112
Source DB: PubMed Journal: Mol Oncol ISSN: 1574-7891 Impact factor: 6.603
Figure 1Schematic procedure of our work. (A) Identification of candidate elements whose expression levels are positively correlated with copy number alterations. (B) Identification of DEGs affected by the copy number alteration for each peak region. (C) Identification of CDEGs for each candidate element. PPIs, lncRNA‐protein binding relationships and miRNA‐target interactions are used to filter the CDEGs. (D) Identification of drivers whose CDEGs significantly overlap with known cancer genes. a + b + c + d is the total number of genes in the expression profile, and a + b is the number of census cancer genes in the expression profile. a + c is the number of CDEGs of one candidate driver. a is the number of overlapping genes between CDEGs and cancer genes. (E) Validation of drivers. (F) Analysis of drivers in multiple cancers.
Figure 2Statistics of drivers. (A) Statistics of the drivers in each cancer. Green, orange and blue cylinders represent driver protein‐coding genes, lncRNAs and miRNAs, respectively. (B) Overlap between drivers and census cancer genes. The gray pillar represents the ratio of drivers to all of the elements in the peak region in each cancer type. The red pillar represents the ratio of overlap between census cancer genes and drivers in each cancer type.
Figure 3Comparisons between drivers and non‐drivers. (A) Degree in the human PPI network. (B) Betweenness centrality in the human PPI network. (C) K a/K s ratio. (D) Mutation rate. The orange and green bars represent drivers and non‐drivers, respectively. The blue bar represents the P‐value tested by Wilcoxon rank‐sum test. P‐values were −log2‐transformed (Y‐axis in the right side). The red dotted line directed by the red arrow represents a P‐value < 0.05.
Figure 4Overlap between drivers and known cancer genes. (A) Overlap between drivers and census cancer genes. (B) Overlap between drivers and cancer genes from DriverDB. (C) Overlap between drivers and cancer genes from Bushman. (D) Overlap between drivers with deletions and tumor suppressor genes in TSGene. The gray bars represent the number of known cancer genes. The orange bars represent the number of overlapping genes between drivers and known cancer genes. The blue bars represent the P‐values of overlapping genes between drivers and known cancer genes calculated by the hypergeometric test. P‐values were −log2‐transformed (Y‐axis in the right side). The red dotted line indicated by the red arrow represents a P‐value < 0.05.
Figure 5Distribution of drivers in 18 types of cancer. (A) Pie chart of drivers, including protein‐coding genes, lncRNAs and miRNAs. (B) Pie chart of driver protein‐coding genes. (C) Pie chart of driver non‐coding RNAs, including lncRNAs and miRNAs. Numbers 1–5 denote the number of cancer types that the driver presents. (D) Significance of overlapping drivers between cancer types. Color represents the −log2‐transformed P‐value calculated by the hypergeometric test. The numbers in the ellipses represent the number of drivers in the cancer type.
Figure 6Driver lncRNA‐cancer network and functional analysis. (A) Network of driver lncRNAs and cancer types. Circles represent lncRNAs, and triangles represent miRNAs. Orange and gray colors represent amplification and deletion, respectively. The octangle represents cancer type. Edge represents the relationship between the identified driver and cancer type. The driver non‐coding RNAs identified in three cancer types are marked by blue dashed rectangles. (B) Pearson correlation of driver lncRNA PVT1 and oncogene MYC in CESC, LIHC and SKCM samples with amplification of PVT1. The z‐score‐transformed expression profile was used. (C) Pearson correlation of driver lncRNA SOX2‐OT and SOX2 in HNSC, KIRC and LUSC samples with amplification of SOX2‐OT. (D) KEGG enrichment analysis of targets of has‐mir‐429 using the hypergeometric test with FDR < 0.05.
Figure 7Cancer driver‐drug network. Relationship of drug, driver and cancer. Triangles and ellipses represent drivers and cancer types. The drivers in specific cancer types are connected by rhombus arrows. The relationship of drugs (capsules) targeting the drivers are marked by T‐type arrows. The known drug and disease (octagons) relationships are marked by arrows. (B) Lapatinib and targets in cancers. (C) Afatinib and targets in cancers. The orange solid T‐type line represents known drug‐cancer relationships, and the orange dotted T‐type line represents the predicted drug‐cancer relationship. (D) Box‐plots of Act Area values of LUSC cell lines treated with lapatinib in the CCLE database. (E) Box‐plots of IC50 values of LGG cell lines treated with lapatinib in the GDSC database. (F) Box‐plots of IC50 values of BRCA cell lines treated with afatinib in the GDSC database. The IC50 values are natural logarithm‐transformed. In (D–F), the red dots present amplification of the corresponding gene in cell lines, and the blue dots represent wild‐type. Wilcoxon rank‐sum test was used.