| Literature DB >> 30606230 |
Dimitra Repana1,2, Joel Nulsen1,2, Lisa Dressler1,2, Michele Bortolomeazzi1,2, Santhilata Kuppili Venkata1,2, Aikaterini Tourna1,2, Anna Yakovleva1,2, Tommaso Palmieri1,2, Francesca D Ciccarelli3,4.
Abstract
The Network of Cancer Genes (NCG) is a manually curated repository of 2372 genes whose somatic modifications have known or predicted cancer driver roles. These genes were collected from 275 publications, including two sources of known cancer genes and 273 cancer sequencing screens of more than 100 cancer types from 34,905 cancer donors and multiple primary sites. This represents a more than 1.5-fold content increase compared to the previous version. NCG also annotates properties of cancer genes, such as duplicability, evolutionary origin, RNA and protein expression, miRNA and protein interactions, and protein function and essentiality. NCG is accessible at http://ncg.kcl.ac.uk/ .Entities:
Keywords: Cancer genes; Cancer genomics screens; Cancer heterogeneity; Systems-level properties
Mesh:
Year: 2019 PMID: 30606230 PMCID: PMC6317252 DOI: 10.1186/s13059-018-1612-0
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Manual curation of cancer genes in NCG. a Pipeline used for adding cancer genes to NCG. Two sources of known cancer genes [19, 53] were integrated leading to 711 known cancer genes. In parallel, 273 publications describing cancer sequencing screens were reviewed to extract 2088 cancer genes. The non-redundant union of these two sets led to 2372 cancer genes currently annotated in NCG. b Intersection between known and candidate cancer genes in NCG. c Comparison of NCG content with the previous version [11]. d Pie chart of the methods used to identify cancer genes in the 273 publications. The total is greater than 273 because some studies used more than one method (Additional file 2: Table S2). e Cancer genes as a function of the number of cancer donors per study. The grey inset shows a magnification of the left bottom corner of the plot. f Number of methods used to identify cancer genes over time. PanSoftware used in one of the pan-cancer studies [6] was considered as a single method but is in fact a combination of 26 prediction tools
Fig. 2Distribution of cancer genes across primary sites and cancer donors. a Number of total cancer genes and proportion of known and candidate cancer genes across the 31 tumor primary sites analyzed in the 267 cancer-specific studies. The number of cancer donors followed by the number of cancer genes is given in brackets for each primary site. b Proportion of candidate cancer genes over all cancer genes across the 31 tumor primary sites. The dot size is proportional to the donor cohort size. c Total number of cancer genes and cancer donors across the 31 tumor primary sites. The color scale in (b) and (c) indicates the number of screens for each primary site
Fig. 3Recurrence of cancer across primary sites and publications. a Proportion of study-specific cancer genes reported by each of the seven skin melanoma screens. b Total number of cancer genes and donors across 24 cancer types of the blood. The full list of blood cancer types is reported in Additional file 2: Table S2. c Number of primary sites in which each known or candidate cancer gene was reported to be a driver. d Number of publications in which each known or candidate cancer gene was reported to be a driver. e Number of methods used to predict cancer genes for drivers found in more than one publication. f Intersection of cancer genes in the cancer-specific and pan-cancer studies. g Venn diagram of cancer genes across the four pan-cancer studies of adult donors. h Intersection of cancer genes in pan-cancer screens of adult and pediatric donors. In f, g, and h, the number of donors followed by the total number of cancer genes are given in brackets
Fig. 4Systems-level properties of cancer genes. a Percentage of genes with ≥ 1 gene duplicate covering ≥ 60% of the protein sequence. b Proportion of genes originating in pre-metazoan species. c, d Number of human tissues in which genes (c) and proteins (d) are expressed. In panel c, tissue types were matched between GTEx and Protein Atlas wherever possible, giving 43 unique tissues. In tissues represented in both datasets, genes were defined as expressed if they had ≥ 1 TPM in both datasets. Only genes present in both sources were compared (Additional file 2: Table S1). e Percentage of genes essential in ≥ 1 cell line and distribution of cell lines in which each gene is essential. Only genes with concordant annotation between OGEE and PICKLES were compared (Additional file 2: Table S1). f Percentage of proteins involved in ≥ 1 protein complex. g Median values of betweenness (centrality), clustering coefficient (clustering), and degree (connectivity) of human proteins in the protein-protein interaction network. h Median values of betweenness and degree of the target genes in the miRNA-target interaction network. The clustering coefficient is zero for all nodes, because interactions occur between miRNAs and target genes. Known, candidate, and all cancer genes were compared to the rest of human genes, while TSGs were compared to OGs. Significance was calculated using a two-sided Fisher test (a, b, e, f) or Wilcoxon test (c, d, g, h). *p < 0.05, **p < 0.01, ***p < 0.001. Enrichment and depletion of cancer genes in representative functional categories taken from level 1 of Reactome (i) and level 2 of KEGG (j). Significance was calculated comparing each group of cancer genes to the rest of human genes using a two-sided Fisher test. False discovery rates were calculated in each gene set separately. Only pathways showing enrichment or depletion are shown. The full list of pathways is provided in Additional file 2: Table S3