Literature DB >> 26635391

DriverDBv2: a database for human cancer driver gene research.

I-Fang Chung¹, Chen-Yang Chen², Shih-Chieh Su³, Chia-Yang Li⁴, Kou-Juey Wu⁵, Hsei-Wei Wang⁶, Wei-Chung Cheng⁷.

Abstract

We previously presented DriverDB, a database that incorporates ∼ 6000 cases of exome-seq data, in addition to annotation databases and published bioinformatics algorithms dedicated to driver gene/mutation identification. The database provides two points of view, 'Cancer' and 'Gene', to help researchers visualize the relationships between cancers and driver genes/mutations. In the updated DriverDBv2 database (http://ngs.ym.edu.tw/driverdb) presented herein, we incorporated >9500 cancer-related RNA-seq datasets and >7000 more exome-seq datasets from The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium (ICGC), and published papers. Seven additional computational algorithms (meaning that the updated database contains 15 in total), which were developed for driver gene identification, are incorporated into our analysis pipeline, and the results are provided in the 'Cancer' section. Furthermore, there are two main new features, 'Expression' and 'Hotspot', in the 'Gene' section. 'Expression' displays two expression profiles of a gene in terms of sample types and mutation types, respectively. 'Hotspot' indicates the hotspot mutation regions of a gene according to the results provided by four bioinformatics tools. A new function, 'Gene Set', allows users to investigate the relationships among mutations, expression levels and clinical data for a set of genes, a specific dataset and clinical features.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Year: 2015 PMID： 26635391 PMCID： PMC4702919 DOI： 10.1093/nar/gkv1314

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

In the past few years, next generation sequencing (NGS) has revolutionized cancer genomic studies. Large-scale cancer genomic projects, such as The Cancer Genome Atlas (TCGA), have utilized different types of sequencing technology (such as RNA-seq and Exome-seq) in analysing cancer samples in order to provide distinct profiles of cancer biology. However, translating the different types of cancer genomic data into information that can be easily interpreted and accessed remains a challenge. The integration of multi-dimensional genomic data has been crucial to our understanding of biologically and clinically relevant subtypes of cancer. One example of integrative analysis was the breast cancer study of TCGA, to show expression subtype-associated enrichment for cancer driver genes. For instance, the ERBB2-expression subtype is associated with the enrichment of TP53 and PIK3CA mutations (1). The recent unbiased genomic characterization of distinct cancers has also provided insights into the driving events in genetic subtypes of cancers that are not well understood. The integrative analysis of cancer genomics data can provide both mechanistic and biological insights into the role of driver genes in a specific cancer type (2). There are several tools, such as MAGI (3) and cBioportal (4,5), that allow for the exploration, annotation and integration of different kinds of cancer genomic data. Mutations are random, but the occurrence of hotspot/clustered mutations is driven by positive selection, especially when the mutations are located in functional domains or in the residues that are important for 3D protein structures (2). The same mutations in hotspot mutation regions (HMRs) may be found as drivers in other cancers. Many driver mutations recurrently occur in the functional regions of proteins (for example, kinase domains or binding domains) (6) or interrupt active sites (for example, phosphorylation sites) (7). Hotspot regions can be grouped into two types (8), mutation clusters and hotspot domains. Hotspot domains are well-annotated domains with higher mutation rates than are found in the remaining regions of the protein. The documentation of a hotspot domain requires a prior annotation of previously known protein domain information for every transcript. Mutation clusters are small fractions of proteins that have accumulated a high number of mutations regardless of whether or not the clusters are located in functional domain regions of the protein. A mutation cluster may even have an extremely high mutation rate; for example, the V600E cluster of the BRAF gene has a very high mutation rate and is located in a tyrosine kinase domain. Some cancer driver genes (such as KRAS and BRAF) have only one HMR, but some (such as PIK3CA) may have two or more HMRs in distinct cancer types. HMRs are strong indicators for cancer in that mutations in these HMRs may promote cancer progression. Hence, it is important to identify HMRs in cancer biology. Several computational methods have been developed for identifying driver genes by defining HMRs (8–13). Previously, we developed DriverDB (14), a database that incorporates ∼6000 cases of exome-seq data, in addition to annotation databases and published bioinformatics algorithms dedicated to driver gene/mutation identification. Here, we present DriverDBv2, an updated version of the database. In addition to including more exome-seq results (>7000 more datasets of exome-seq from TCGA, ICGC and published papers), we have incorporated seven more algorithms developed for driver gene identification in this updated version. Four of those seven methods identify driver genes according to the identification of HMRs. We also provide information on these HMRs in this updated database. Specifically, we have integrated >9500 RNA-seq into DriverDBv2 to provide expression profiles across cancer types. DriverDBv2 also contains a new function called ‘GeneSet’, which allows researchers to visualize the mutations, expression levels and clinical profiles of customer-defined genes, datasets and clinical data.

DATA COLLECTION AND PREPROCESSING

DriverDBv2 incorporates >7000 additional exome-seq datasets from TCGA, ICGC and published papers, as well as RNA-seq data from >9500 cancer-related samples (such as primary tumor, normal tissue and metastatic tissue) in TCGA. Detailed information on these datasets is described in Supplementary Table S1. All sequencing results, such as mutation and expression data, have been curated in uniform formats by an in-house script and then stored in our local MySQL server. All mutations are also functionally annotated as described in our previous study (14). For all clinical data downloaded from distinct studies using varied terminologies, we have standardized them using the Common Data Element (CDE) format, the standard elements of which are used in the validation of clinical data in TCGA, through manual curation according to the definition of terms (https://tcga-data.nci.nih.gov/docs/dictionary/).

DRIVER GENE AND HMR IDENTIFICATION

DriverDBv2 contains seven additional algorithms for driver gene identification. DriverNet (15) and DawnRank (16) utilize transcriptional networks to identify driver genes. The rationale of the two algorithms is that the impact of a potential driver gene can be determined by its effect on the genes that are regulated by it. COMDP (17) is based on mutual exclusivity to identify sets of driver genes mutated in known pathways. The other four algorithms, MSEA (8), e-Drivers (9), oncodriveCLUST (12) and iPAC (11), identify cancer driver genes by defining the HMRs. OncodriveCLUST and iPAC only identify mutation clusters and e-Driver only identifies hotspot domains, but MSEA can identify both types of HMRs. All HMRs identified by the four algorithms are integrated and illustrated in the ‘Hotspot’ panel of the ‘Gene’ section. The detailed criteria of the seven new algorithms are described in the Supplementary Methods.

WEB INTERFACE

Gene

As shown in Figure 1, we provide three new panels, ‘Summary’, ‘Expression’ and ‘Hotspot’, in the ‘Gene’ section of the updated database. In Figure 1, we used the gene TP53 as an example. For ‘Summary’, a heat map shows which bioinformatics tool identifies the gene as a driver gene in which cancer type (Figure 1A). The bar chart at the top of the heat map indicates the cumulative counts of tools. In the ‘Hotspot’ panel, a heat map shows the regions of the protein that are identified as HMRs across different cancer types (Figure 1B). The color used for a given region indicates the number of tools that identify that region as an HMR. The cumulative counts for the regions identified as HRMs are shown at the top of the heat map. Exon and domain information with protein coordinates are provided at the bottom of the heat map. For the ‘Expression’ panel, the expression profiles of the gene across cancer types by sample type and by mutation class are illustrated by boxplot in Figure 1C and D, respectively. The colors used in Figure 1C and D indicate the sample types (such as normal tissue and primary tumor) and mutation classes (such as truncating and in-frame mutations), respectively.

Figure 1.

The three new features in the ‘Gene’ section. (A) The ‘Summary’ panel. A heat map shows which bioinformatics tool identifies the gene as a driver gene in which cancer type. The upper panel shows the cumulative counts of bioinformatics tools. (B) The ‘Hotspot’ panel. A heat map shows the regions of the protein identified as HMRs across different cancer types. The color used for a given region indicates the number of tools that identify that region as an HMR. The upper panel shows the cumulative counts of the regions identified as HRMs. Exon and domain information with protein coordinates are provided at the bottom of the heat map. (C and D) The ‘Expression’ panel. The expression boxplots of the gene across cancer types by sample type (C) and by mutation class (D). The colors in (C) and (D) indicate the sample types and mutation classes, respectively.

GeneSet

The new function, ‘GeneSet’, was designed to help researchers visualize the relationship among mutation, expression, and clinical information. Figure 2 is an example of KRAS, NRAS and RAF in colon adenocarcinoma samples from TCGA. As shown in Supplementary Figure S1, researchers could upload a set of genes, select a specific dataset and choose up to three clinical characteristics of the selected dataset. After the query is submitted, an integrative figure (Figure 2A) displays the relationship among the three kinds of information. For clinical plot, clinical data may be various and complex. To simplify this issue, we used the grayscale to indicate the level of data for each clinical characteristic and remove the figure legend. The red color indicates the value is not available. In addition, two expression boxplots show the expression of uploaded genes by sample type (Figure 2B) and by mutation class (Figure 2C). The raw data are available for download via a download link.

Figure 2.

The new function, ‘GeneSet’. (A) An integrative figure displays the relationship between mutation, expression levels, and clinical information. For the clinical plot, the grayscale indicates the level of data for each clinical characteristic and the red color indicates the value is not available. (B and C) Two expression boxplots show the expression of uploaded genes in terms of sample type (B) and mutation class (C).

DISCUSSION

The integrated analysis of multi-dimensional genomic data is crucial to our understanding of cancer biology. DriverDBv2 seeks to integrate mutation and expression data to address several issues. For driver gene identification, Drivernet, MeMO and DawnRank, the tools used for identifying driver genes in DriverDBv2, utilize two types of data to predict cancer driver genes and may provide additional insights regarding those cancer driver genes. For a specific gene, the expression of the gene may differ in mutated cases as compared to normal cases. For example, a reduced expression of STAG2 in mutant cases has previously been reported (18,19). The ‘Expression’ panel in the ‘Gene’ section of DriverDBv2 shows the expression boxplots for a given gene in different cancer types by mutation class and by sample type. This function will be helpful when researchers would like to quickly evaluate an interesting gene in distinct cancer types or validate their wet lab results in silicon. Moreover, the new function ‘GeneSet’ further integrates mutations, expression levels and clinical information for visualization. It has previously been noted that the co-occurrence of a mutated gene with the abnormal expression of another gene may be related to a specific phenotype. The example of abnormal MITF expression with mutated BRAF has been used to illustrate this concept (20). When MITF overexpression occurs in isolation, it does not affect the proliferation of immortalized melanocytes; however, it does affect their proliferation when it also occurs with the expression of the BRAF V600E mutant, which co-occurs with abnormal MITF expression. The ‘GeneSet’ panel could help explore this relationship. Furthermore, we have also provided the raw data for the integrative figure in ‘GeneSet’ for further analysis. To answer whether a gene is a driver in cancer, DriverDBv2 provides the new panel, ‘Summary’, in ‘Gene’ section. This panel shows which bioinformatics tool identifies the gene as a driver gene in which type of cancer (Figure 1A) The occurrence of hotspot mutations is driven by positive selection and is a strong indicator for cancer in that mutations in hotspot regions may promote cancer progression. Hence, it is important to identify HMRs in cancer biology. It has been noted that some driver genes have one or more HMRs. For example, mutations in PIK3CA form two clusters in the helical and catalytic domains (2,21). In extreme cases, driver genes have highly recurrent substitutions that change the same amino acid, such as in the case of the arginine at codon 132 in IDH1 (22) and the V600 mutation in BRAF (23). Jia et al. investigated known cancer genes from the Cancer Gene Census (CGC) (24) collection and investigated mutations from COSMIC database (25). They found that the known driver genes from CGC genes were detected through mutation analysis in previous studies; approximately 51% of the CGC genes can be detected through mutation hotspot analysis (8). This high proportion of genes with HMRs supports the feasibility of predicting additional cancer genes based on mutation clustering patterns. DriverDBv2 integrates the information of HMRs in distinct cancer types through the utilization of four bioinformatics tools and illustrates the results in the ‘Hotspot’ panel of the ‘Gene’ section. The information thus provided tells researchers whether the driver gene that they are interested in has the same or distinct HMRs in different cancer types. In this updated version, we have integrated exome-seq and RNA-seq data to identify cancer driver genes and HMRs from larger-scale cancer sequencing data. DriverDBv2 provides researchers with easy access to different aspects of information regarding cancer driver genes. In the future, we will incorporate more different kinds of genomics data in further updates to DriverDB, so that the database will continue to be an informative and valuable source of data on cancer driver genes.

25 in total

1. MAGI: visualization and collaborative annotation of genomic aberrations.

Authors: Mark D M Leiserson; Connor C Gramazio; Jason Hu; Hsin-Ta Wu; David H Laidlaw; Benjamin J Raphael
Journal: Nat Methods Date: 2015-06 Impact factor: 28.547

2. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal.

Authors: Jianjiong Gao; Bülent Arman Aksoy; Ugur Dogrusoz; Gideon Dresdner; Benjamin Gross; S Onur Sumer; Yichao Sun; Anders Jacobsen; Rileen Sinha; Erik Larsson; Ethan Cerami; Chris Sander; Nikolaus Schultz
Journal: Sci Signal Date: 2013-04-02 Impact factor: 8.192

3. Integrative genomic analyses identify MITF as a lineage survival oncogene amplified in malignant melanoma.

Authors: Levi A Garraway; Hans R Widlund; Mark A Rubin; Gad Getz; Aaron J Berger; Sridhar Ramaswamy; Rameen Beroukhim; Danny A Milner; Scott R Granter; Jinyan Du; Charles Lee; Stephan N Wagner; Cheng Li; Todd R Golub; David L Rimm; Matthew L Meyerson; David E Fisher; William R Sellers
Journal: Nature Date: 2005-07-07 Impact factor: 49.962

4. MSEA: detection and quantification of mutation hotspots through mutation set enrichment analysis.

Authors: Peilin Jia; Quan Wang; Qingxia Chen; Katherine E Hutchinson; William Pao; Zhongming Zhao
Journal: Genome Biol Date: 2014 Impact factor: 13.583

5. An integrated genomic analysis of human glioblastoma multiforme.

Authors: D Williams Parsons; Siân Jones; Xiaosong Zhang; Jimmy Cheng-Ho Lin; Rebecca J Leary; Philipp Angenendt; Parminder Mankoo; Hannah Carter; I-Mei Siu; Gary L Gallia; Alessandro Olivi; Roger McLendon; B Ahmed Rasheed; Stephen Keir; Tatiana Nikolskaya; Yuri Nikolsky; Dana A Busam; Hanna Tekleab; Luis A Diaz; James Hartigan; Doug R Smith; Robert L Strausberg; Suely Kazue Nagahashi Marie; Sueli Mieko Oba Shinjo; Hai Yan; Gregory J Riggins; Darell D Bigner; Rachel Karchin; Nick Papadopoulos; Giovanni Parmigiani; Bert Vogelstein; Victor E Velculescu; Kenneth W Kinzler
Journal: Science Date: 2008-09-04 Impact factor: 47.728

6. Mutations of the BRAF gene in human cancer.

Authors: Helen Davies; Graham R Bignell; Charles Cox; Philip Stephens; Sarah Edkins; Sheila Clegg; Jon Teague; Hayley Woffendin; Mathew J Garnett; William Bottomley; Neil Davis; Ed Dicks; Rebecca Ewing; Yvonne Floyd; Kristian Gray; Sarah Hall; Rachel Hawes; Jaime Hughes; Vivian Kosmidou; Andrew Menzies; Catherine Mould; Adrian Parker; Claire Stevens; Stephen Watt; Steven Hooper; Rebecca Wilson; Hiran Jayatilake; Barry A Gusterson; Colin Cooper; Janet Shipley; Darren Hargrave; Katherine Pritchard-Jones; Norman Maitland; Georgia Chenevix-Trench; Gregory J Riggins; Darell D Bigner; Giuseppe Palmieri; Antonio Cossu; Adrienne Flanagan; Andrew Nicholson; Judy W C Ho; Suet Y Leung; Siu T Yuen; Barbara L Weber; Hilliard F Seigler; Timothy L Darrow; Hugh Paterson; Richard Marais; Christopher J Marshall; Richard Wooster; Michael R Stratton; P Andrew Futreal
Journal: Nature Date: 2002-06-09 Impact factor: 49.962

7. COSMIC: exploring the world's knowledge of somatic mutations in human cancer.

Authors: Simon A Forbes; David Beare; Prasad Gunasekaran; Kenric Leung; Nidhi Bindal; Harry Boutselakis; Minjie Ding; Sally Bamford; Charlotte Cole; Sari Ward; Chai Yin Kok; Mingming Jia; Tisham De; Jon W Teague; Michael R Stratton; Ultan McDermott; Peter J Campbell
Journal: Nucleic Acids Res Date: 2014-10-29 Impact factor: 16.971

8. A comprehensive survey of Ras mutations in cancer.

Authors: Ian A Prior; Paul D Lewis; Carla Mattos
Journal: Cancer Res Date: 2012-05-15 Impact factor: 12.701

9. DriverDB: an exome sequencing database for cancer driver gene identification.

Authors: Wei-Chung Cheng; I-Fang Chung; Chen-Yang Chen; Hsing-Jen Sun; Jun-Jeng Fen; Wei-Chun Tang; Ting-Yu Chang; Tai-Tong Wong; Hsei-Wei Wang
Journal: Nucleic Acids Res Date: 2013-11-07 Impact factor: 16.971

10. Discovery of co-occurring driver pathways in cancer.

Authors: Junhua Zhang; Ling-Yun Wu; Xiang-Sun Zhang; Shihua Zhang
Journal: BMC Bioinformatics Date: 2014-08-09 Impact factor: 3.169

47 in total

1. Long noncoding RNA HOTAIR promotes invasion of breast cancer cells through chondroitin sulfotransferase CHST15.

Authors: Liang-Chih Liu; Yuan-Liang Wang; Pei-Le Lin; Xiang Zhang; Wei-Chung Cheng; Shu-Hsuan Liu; Chih-Jung Chen; Yu Hung; Chia-Ing Jan; Ling-Chu Chang; Xiaoyang Qi; Linda C Hsieh-Wilson; Shao-Chun Wang
Journal: Int J Cancer Date: 2019-04-26 Impact factor: 7.396

2. DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays.

Authors: Amrit Singh; Casey P Shannon; Benoît Gautier; Florian Rohart; Michaël Vacher; Scott J Tebbutt; Kim-Anh Lê Cao
Journal: Bioinformatics Date: 2019-09-01 Impact factor: 6.937

3. Data mining of micrornas in breast carcinogenesis which may be a potential target for cancer prevention.

Authors: Jin-Wook Kang; Min-Ji Kim; Hyun-Ah Baek; Jeong-Sang Lee
Journal: Food Sci Biotechnol Date: 2016-03-31 Impact factor: 2.391

4. DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies.

Authors: Yi Han; Juze Yang; Xinyi Qian; Wei-Chung Cheng; Shu-Hsuan Liu; Xing Hua; Liyuan Zhou; Yaning Yang; Qingbiao Wu; Pengyuan Liu; Yan Lu
Journal: Nucleic Acids Res Date: 2019-05-07 Impact factor: 16.971

5. Comprehensive evaluation of computational methods for predicting cancer driver genes.

Authors: Xiaohui Shi; Huajing Teng; Leisheng Shi; Wenjian Bi; Wenqing Wei; Fengbiao Mao; Zhongsheng Sun
Journal: Brief Bioinform Date: 2022-03-10 Impact factor: 11.622

6. Nearing saturation of cancer driver gene discovery.

Authors: David Hsiehchen; Antony Hsieh
Journal: J Hum Genet Date: 2018-06-15 Impact factor: 3.172

7. Genome-wide profiling of prognosis-related alternative splicing signatures in sarcoma.

Authors: Weifeng Hong; Weicong Zhang; Renguo Guan; Yuying Liang; Shixiong Hu; Yayun Ji; Mouyuan Liu; Hai Lu; Min Yu; Liheng Ma
Journal: Ann Transl Med Date: 2019-10

8. Antigen Identification for Orphan T Cell Receptors Expressed on Tumor-Infiltrating Lymphocytes.

Authors: Marvin H Gee; Arnold Han; Shane M Lofgren; John F Beausang; Juan L Mendoza; Michael E Birnbaum; Michael T Bethune; Suzanne Fischer; Xinbo Yang; Raquel Gomez-Eerland; David B Bingham; Leah V Sibener; Ricardo A Fernandes; Andrew Velasco; David Baltimore; Ton N Schumacher; Purvesh Khatri; Stephen R Quake; Mark M Davis; K Christopher Garcia
Journal: Cell Date: 2017-12-21 Impact factor: 41.582

9. Profiling of hepatocellular carcinoma neoantigens reveals immune microenvironment and clonal evolution related patterns.

Authors: Zhenli Li; Geng Chen; Zhixiong Cai; Xiuqing Dong; Lei He; Liman Qiu; Yongyi Zeng; Xiaolong Liu; Jingfeng Liu
Journal: Chin J Cancer Res Date: 2021-06-30 Impact factor: 5.087

10. Systematic identification of clinically relevant miRNAs for potential miRNA-based therapy in lung adenocarcinoma.

Authors: Shu-Hsuan Liu; Kai-Wen Hsu; Yo-Liang Lai; Yu-Feng Lin; Fang-Hsin Chen; Pei-Hwa Peng; Li-Jie Lin; Heng-Hsiung Wu; Chia-Yang Li; Shu-Chi Wang; Min-Zu Wu; Yuh-Pyng Sher; Wei-Chung Cheng
Journal: Mol Ther Nucleic Acids Date: 2021-05-01 Impact factor: 8.886