| Literature DB >> 27192119 |
Ran Wei1, Yao Yao1, Wu Yang2, Chun-Hou Zheng2,3, Min Zhao4, Junfeng Xia1,3.
Abstract
Cancer predisposition genes (CPGs) are genes in which inherited mutations confer highly or moderately increased risks of developing cancer. Identification of these genes and understanding the biological mechanisms that underlie them is crucial for the prevention, early diagnosis, and optimized management of cancer. Over the past decades, great efforts have been made to identify CPGs through multiple strategies. However, information on these CPGs and their molecular functions is scattered. To address this issue and provide a comprehensive resource for researchers, we developed the Cancer Predisposition Gene Database (dbCPG, Database URL: http://bioinfo.ahu.edu.cn:8080/dbCPG/index.jsp), the first literature-based gene resource for exploring human CPGs. It contains 827 human (724 protein-coding, 23 non-coding, and 80 unknown type genes), 637 rats, and 658 mouse CPGs. Furthermore, data mining was performed to gain insights into the understanding of the CPGs data, including functional annotation, gene prioritization, network analysis of prioritized genes and overlap analysis across multiple cancer types. A user-friendly web interface with multiple browse, search, and upload functions was also developed to facilitate access to the latest information on CPGs. Taken together, the dbCPG database provides a comprehensive data resource for further studies of cancer predisposition genes.Entities:
Keywords: cancer predisposition gene; database; functional annotation; gene prioritization; network module
Mesh:
Year: 2016 PMID: 27192119 PMCID: PMC5122350 DOI: 10.18632/oncotarget.9334
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Figure 1The database structure of dbCPG
Annotation entry statistics for 827 CPGs
| Data category | Related entries | Annotated CPGs | Content/sources |
|---|---|---|---|
| Human CPGs | 827 | 827 | Gene ID, official symbol, official full name, synonym, position, gene type, OMIM ID from Entrez gene database; cancer syndrome, major associated tumor type, mechanism of action of CPG mutations, mode of inheritance from PubMed |
| Rat CPGs | 637 | 637 | Rat CPGs mapped from HomoloGene |
| Mouse CPGs | 658 | 658 | Mouse CPGs mapped from MGI Human Mouse Orthologs |
| Literature | 2097 | 805 | Literature evidence for CPGs |
| OMIM | 22 | 22 | Disorder description for CPGs |
| Expression | 8873 | 654 | Expression Atlas database |
| Methylation | 5292 | 695 | DiseaseMeth database |
| PTM | 11701 | 366 | dbPTM |
| Germline mutation | 29816 | 420 | ClinVar |
| Interaction | 20004 | 610 | PINA database |
| Pathway | 8640 | 580 | MsigDB database |
| Drug | 1651 | 133 | DGIdb database |
*CPG is short for cancer predisposition gene, MGI is short for mouse genome informatics, PTM is short for post-translational modification.
Figure 2Data statistics based on (A) Chromosome location, (B) Data source, (C) Gene type, and (D) Cancer type in human CPGs
Figure 3Overlap between cancer genes with somatic mutations and CPGs
570 cancer genes with somatic mutations are from COSMIC of which 218 are also included within the 827 human CPGs in dbCPG.
Top 20 enriched diseases of the 724 protein-coding CPGs
| Disease name | Raw | Benjamini-Hochberg adjusted |
|---|---|---|
| Cancer | 4.98E-30 | 5.67E-26 |
| Breast cancer | 3.77E-29 | 2.15E-25 |
| Colorectal cancer | 1.35E-27 | 5.13E-24 |
| Lung cancer | 5.18E-23 | 1.47E-19 |
| Prostate cancer | 4.04E-20 | 9.19E-17 |
| Stomach cancer | 3.54E-17 | 6.70E-14 |
| Bladder cancer | 8.49E-13 | 8.05E-10 |
| Esophageal cancer | 4.68E-12 | 2.32E-09 |
| Ovarian cancer | 2.72E-09 | 4.76E-07 |
| Endometrial cancer | 2.77E-08 | 2.76E-06 |
| Endometriosis | 3.59E-08 | 3.44E-06 |
| Head and neck cancer | 3.84E-08 | 3.55E-06 |
| Oral cancer | 1.11E-07 | 8.32E-06 |
| Diabetes, type 1 | 1.24E-07 | 8.94E-06 |
| Melanoma | 1.28E-07 | 9.08E-06 |
| Stomach neoplasms | 4.64E-07 | 2.59E-05 |
| Sarcoidosis | 6.33E-07 | 3.32E-05 |
| Infection | 9.24E-07 | 4.45E-05 |
| Neoplasms | 9.81E-07 | 4.62E-05 |
| Leukemia | 1.09E-06 | 5.02E-05 |
Figure 4The shared CPGs across 17 cancer types
The length of circularly arranged segment is proportional to the total CPGs in each cancer type. The ribbons connecting different segments represent the number of shared CPGs between cancer types. The three outer rings are stacked bar plots that represent relative contribution of other cancer types to the cancer type's totals, where the innermost, middle, and outermost ring represents the number of CPGs that other cancers share with a specific cancer, the number of CPGs that a specific cancer share with the other cancers, and the sum number of CPGs among different cancer types, respectively.
Figure 5The enriched dense network module using the 100 CPG (57 training genes and top 43 test genes) based on protein-protein interaction data
The 97 genes in diamond are terminal genes from the 100 CPGs. The remaining 10 genes in triangle are linker genes bridged the 92 genes.