| Literature DB >> 30453881 |
Ivan P Gorlov1, Claudio W Pikielny2, Hildreth R Frost2, Stephanie C Her2, Michael D Cole2, Samuel D Strohbehn2, David Wallace-Bradley3, Marek Kimmel3, Olga Y Gorlova2, Christopher I Amos2.
Abstract
BACKGROUND: Because driver mutations provide selective advantage to the mutant clone, they tend to occur at a higher frequency in tumor samples compared to selectively neutral (passenger) mutations. However, mutation frequency alone is insufficient to identify cancer genes because mutability is influenced by many gene characteristics, such as size, nucleotide composition, etc. The goal of this study was to identify gene characteristics associated with the frequency of somatic mutations in the gene in tumor samples.Entities:
Keywords: COSMIC; Cancer genes; Catalog of somatic mutations in Cancer; Frameshift mutations; Missense; Nonsense; Somatic mutations
Mesh:
Substances:
Year: 2018 PMID: 30453881 PMCID: PMC6245819 DOI: 10.1186/s12859-018-2455-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The relationship between the number of missense, nonsense, and frameshift mutations and gene size
Fig. 2The relationship between the nucleotide composition and the density of missense (first column), nonsense (second column), and FS (third column) mutations
Fig. 3(a) The relationship between average expression in CCLE cancer cell lines and the mutation densities. (b) The relationship between the density of silent mutations and the densities of missense, nonsense and frameshift mutations. (c) The relationship between the relative replication time and the densities of missense, nonsense, and frameshift mutations
Pair-wise correlations between gene characteristics
| % “A” | % “G” | % “C” | % “T” | ND | % “CpG” | CDS | SNPD | EC | NPS | NPM | AGE | LAGE | RRT | HA | NSM | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| % “A” | 1.00 | − 0.71 | − 0.88 | 0.47 | 0.35 | − 0.72 | 0.11 | − 0.22 | 0.09 | 0.27 | 0.12 | 0.19 | 0.18 | −0.26 | − 0.04 | − 0.01 |
| % “G” | −0.71 | 1.00 | 0.60 | −0.78 | −0.46 | 0.76 | −0.11 | 0.20 | −0.06 | − 0.22 | − 0.12 | − 0.06 | − 0.04 | 0.27 | 0.06 | −0.03 |
| % “C” | − 0.88 | 0.60 | 1.00 | −0.69 | − 0.44 | 0.74 | − 0.06 | 0.22 | − 0.13 | − 0.22 | −0.07 | − 0.24 | −0.23 | 0.25 | 0.04 | 0.06 |
| % “T” | 0.47 | −0.78 | −0.69 | 1.00 | 0.54 | −0.71 | 0.04 | −0.18 | 0.10 | 0.13 | 0.04 | 0.11 | 0.10 | −0.24 | −0.06 | − 0.04 |
| ND | 0.35 | −0.46 | −0.44 | 0.54 | 1.00 | −0.60 | 0.08 | −0.09 | 0.16 | 0.12 | 0.09 | 0.10 | 0.09 | −0.09 | −0.02 | 0.09 |
| % “CpG” | −0.72 | 0.76 | 0.74 | −0.71 | −0.60 | 1.00 | −0.13 | 0.15 | −0.04 | − 0.24 | −0.14 | − 0.09 | −0.08 | 0.19 | 0.04 | −0.03 |
| CDS | 0.11 | −0.11 | −0.06 | 0.04 | 0.08 | −0.13 | 1.00 | −0.10 | 0.08 | 0.97 | 1.00 | −0.09 | −0.07 | − 0.05 | −0.01 | 0.81 |
| SNPD | −0.22 | 0.20 | 0.22 | −0.18 | −0.09 | 0.15 | −0.10 | 1.00 | −0.17 | − 0.13 | −0.10 | − 0.13 | −0.13 | 0.09 | 0.02 | 0.02 |
| EC | 0.09 | −0.06 | −0.13 | 0.10 | 0.16 | −0.04 | 0.08 | −0.17 | 1.00 | 0.08 | 0.09 | 0.27 | 0.27 | 0.08 | −0.01 | 0.01 |
| NPS | 0.27 | −0.22 | −0.22 | 0.13 | 0.12 | −0.24 | 0.97 | −0.13 | 0.08 | 1.00 | 0.97 | −0.06 | −0.04 | − 0.09 | − 0.02 | 0.74 |
| NPM | 0.12 | −0.12 | −0.07 | 0.04 | 0.09 | −0.14 | 1.00 | −0.10 | 0.09 | 0.97 | 1.00 | −0.09 | −0.07 | − 0.05 | − 0.01 | 0.81 |
| AGE | 0.19 | −0.06 | −0.24 | 0.11 | 0.10 | −0.09 | −0.09 | − 0.13 | 0.27 | − 0.06 | −0.09 | 1.00 | 0.98 | 0.19 | 0.07 | −0.26 |
| LAGE | 0.18 | −0.04 | −0.23 | 0.10 | 0.09 | −0.08 | −0.07 | − 0.13 | 0.27 | − 0.04 | −0.07 | 0.98 | 1.00 | 0.20 | 0.08 | −0.26 |
| RRT | −0.26 | 0.27 | 0.25 | −0.24 | −0.09 | 0.19 | −0.05 | 0.09 | 0.08 | −0.09 | − 0.05 | 0.19 | 0.20 | 1.00 | 0.18 | −0.17 |
| HA | −0.04 | 0.06 | 0.04 | −0.06 | −0.02 | 0.04 | −0.01 | 0.02 | −0.01 | − 0.02 | − 0.01 | 0.07 | 0.08 | 0.18 | 1.00 | −0.05 |
| NSM | −0.01 | −0.03 | 0.06 | −0.04 | 0.09 | −0.03 | 0.81 | 0.02 | 0.01 | 0.74 | 0.81 | −0.26 | −0.26 | − 0.17 | −0.05 | 1.00 |
ND- Nucleotide diversity, CDS CDS size, SNPD - SNP density, EC - Evolutionary conservation, NPS - N potential stops, NPM - N potential missense, AGE - Average gene expression, LAGE - LOG of average gene expression, RRT - Relative replication time, HA - Chromatin accessibility, NSM - N of silent mutations
Gene characteristics associated with the number of missense mutations per gene in univariate regression models
| Predictor | T-test | Beta (ß) | |
|---|---|---|---|
| Number of silent mutations in the gene | 289.9 | 2.5 × 10− 1409 | 0.92 |
| Number of potential missense mutation sites | 167.5 | 5.8 × 10− 805 | 0.80 |
| Gene size in nucleotides | 167.1 | 1.6 × 10− 803 | 0.80 |
| “reptime” from MutsigCV | 28.6 | 1.4 × 10− 126 | 0.23 |
| Relative replication time | −27.5 | 1.0 × 10− 124 | 0.21 |
| Gene expression in CCLE cancer cell lines* | −26.5 | 9.6 × 10− 118 | − 0.21 |
| “hic” from MutsigCV | −24.5 | 6.6 × 10− 113 | − 0.20 |
| “expr” from MutsigCV | −16.8 | 1.5 × 10− 54 | − 0.14 |
| Percentage of “CpG” | − 16.3 | 1.7 × 10− 53 | −0.02 |
| Percentage of “G” | − 15.9 | 3.1 × 10− 51 | −0.13 |
| Nucleotide diversity | 14.7 | 5.5 × 10− 45 | 0.12 |
| Percentage of “A” | 11.2 | 5.7 × 10− 28 | 0.10 |
| Chromatin accessibility | −7.7 | 3.1 × 10− 51 | −0.06 |
| Percentage of “C” | −7.1 | 5.5 × 10− 45 | − 0.06 |
| Percentage of “T” | 7.0 | 5.7 × 10− 28 | 0.06 |
| Density of SNPs (1 K Genomes Project) | −3.7 | 1.6 × 10− 14 | − 0.03 |
| Evolutionary conservation | 1.5 | 1.1 × 10− 12 | 0.01 |
Gene characteristics associated with the number of nonsense mutations in the univariate linear regression model
| Predictor | T-test | Beta (ß) | ||
|---|---|---|---|---|
| Number of potential nonsense mutation sites | 91.3 | 3.1 × 10− 427 | 0.59 | |
| Gene size in nucleotides | 84.7 | 7.8 × 10− 395 | 0.56 | |
| Number of silent mutations in the gene | 80.0 | 1.6 × 10−371 | 0.54 | |
| Percentage of “A” | 24.2 | 1.4 × 10−102 | 0.19 | |
| Percentage of “G” | −22.6 | 1.0 × 10−91 | −0.18 | |
| Percentage of “CpG” | − 21.9 | 1.2 × 10−87 | 0.01 | |
| “reptime” from MutsigCV | 20.1 | 2.6 × 10−76 | 0.17 | |
| Percentage of “C” | −19.6 | 1.0 × 10− 72 | − 0.15 | |
| Relative replication time | − 19.4 | 1.7 × 10− 71 | − 0.15 | |
| “hic” from MutsigCV | − 17.1 | 4.5 × 10−58 | − 0.14 | |
| “expr” from MutsigCV | −16.0 | 1.1 × 10− 51 | − 0.13 | |
| Percentage of “T” | 14.4 | 2.7 × 10− 43 | 0.11 | |
| Nucleotide diversity | 12.6 | 2.0 × 10− 34 | 0.10 | |
| Gene expression in CCLE cancer cell lines* | −11.2 | 1.1 × 10− 27 | − 0.09 | |
| Density of SNPs (1 K Genomes Project) | −8.8 | 3.5 × 10−18 | −0.07 | |
| Evolutionary conservation | 5.0 | 3.7 × 10−7 | 0.04 | |
| Chromatin accessibility | −4.9 | 6.7 × 10− 7 | −0.04 |
Gene characteristics associated with the number of FS mutations per gene in univariate linear regression model
| Predictor | T-test | Beta (ß) | |
|---|---|---|---|
| Gene size in nucleotides | 65.6 | 1.7 × 10− 354 | 0.46 |
| Number of silent mutations in the gene | 52.3 | 1.2 × 10− 288 | 0.39 |
| Percentage of “A” | 14.6 | 2.8 × 10− 44 | 0.12 |
| Percentage of “G” | −14.1 | 1.8 × 10− 41 | − 0.11 |
| Percentage of “CpG” | −12.7 | 1.3 × 10−34 | − 0.10 |
| Percentage of “C” | − 7.9 | 2.5 × 10− 15 | − 0.06 |
| Evolutionary conservation | 6.4 | 1.6 × 10− 10 | 0.34 |
| “reptime” from MutsigCV | 5.9 | 1.9 × 10− 9 | 0.05 |
| Density of SNPs (1 K Genomes Project) | −5.7 | 7.6 × 10− 9 | − 0.05 |
| “expr” from MutsigCV | −4.9 | 5.9 × 10− 7 | − 0.04 |
| Relative replication time | − 4.8 | 9.4 × 10− 7 | − 0.04 |
| Percentage of “T” | 4.2 | 1.6 × 10−5 | 0.03 |
| “hic” from MutsigCV | − 3.4 | 3.4 × 10− 4 | − 0.03 |
| Gene expression in CCLE ancer cell lines* | −1.7 | 4.7 × 10− 2 | − 0.01 |
| Chromatin accessibility | −1.4 | 8.4 × 10− 2 | − 0.01 |
| Nucleotide diversity | 0.2 | 4.2 × 10− 1 | 0.00 |
Gene characteristics associated with the number of missense, nonsense and frameshift mutations analyzed together in univariate linear regression model
| Predictor | T-test | Beta (ß) | |
|---|---|---|---|
| Number of silent mutations in the gene | 265.2 | 3.6x− 1328 | 0.90 |
| Gene size in nucleotides | 172.5 | 2.7x− 856 | 0.81 |
| “reptime” from MutsigCV | 28.0 | 2.1x− 128 | 0.23 |
| Relative replication time | −26.8 | 6.5x− 120 | − 0.21 |
| Gene expression in CCLE cancer cell lines* | −24.8 | 1.2x− 106 | − 0.19 |
| “hic” from MutsigCV | −23.8 | 8.0x− 100 | − 0.19 |
| Percentage of “CpG” | −17.9 | 1.8x− 62 | − 0.14 |
| Percentage of “G” | − 17.7 | 1.8x− 61 | − 0.14 |
| “expr” from MutsigCV | −17.0 | 3.1x−57 | − 0.14 |
| Percentage of “A” | 15.3 | 7.3x− 48 | 0.12 |
| Nucleotide diversity | 14.4 | 4.9x− 43 | 0.11 |
| Percentage of “C” | −9.0 | 5.6x−19 | − 0.07 |
| Percentage of “T” | 8.0 | 1.6x− 15 | 0.06 |
| Chromatin accessibility | − 7.5 | 1.0x− 13 | − 0.06 |
| Density of SNPs (1 K Genomes Project) | −4.7 | 1.6x− 6 | −0.04 |
| Evolutionary conservation | 2.4 | 8.4x−3 | 0.02 |
Gene characteristics selected for the model building for the missense, nonsense, and frameshift mutations
| Predictor | Used for | ||
|---|---|---|---|
| Missense | Nonsense | Frameshift | |
| Density of SNPS (1 K Genomes Project) | yes | no | yes |
| Evolutionary conservation | yes | yes | no |
| Gene expression in CCLE cancer cell linesa | yes | yes | no |
| Gene size in nucleotides | yes | yes | yes |
| Nucleotide diversity | yes | yes | yes |
| Number of potential substitution sites | yes | yes | yes |
| Number of silent mutations in the gene | yes | yes | yes |
| Percentage of “A” | no | yes | yes |
| Percentage of “C” | yes | yes | no |
| Percentage of “G” | yes | yes | yes |
| Percentage of “T” | yes | yes | no |
| Percentage of “CpGs” | yes | yes | yes |
| Replication time | yes | yes | no |
| Chromatin accessibility | yes | yes | no |
aAverage gene expression across 1037 cancer cell lines from the Cancer Cell Line Encyclopedia (CCLE)
Gene characteristics significant in stepwise best subset multiple linear regression model for the prediction of the number of missense mutations
| Predictor | T-test | Beta (ß) | |
|---|---|---|---|
| Number of silent mutations in the gene | 136.47 | 1.8 × 10− 376 | 0.76 |
| Replication time | −7.74 | 1.0 × 10−14 | −0.03 |
| Percentage of “C” | − 7.64 | 2.4 × 10− 14 | − 0.06 |
| Nucleotide diversity | −6.65 | 3.1 × 10−11 | − 0.03 |
| Percentage of “G” | − 6.14 | 8.2 × 10−10 | − 0.03 |
| “reptime” from MutsigCV | 5.24 | 1.7 × 10− 7 | 0.02 |
| Evolutionary conservation | −4.57 | 4.9 × 10− 6 | − 0.01 |
| Percentage of “CpGs” | − 3.62 | 3.0 × 10− 4 | − 0.02 |
| “expr” from MutsigCV | 2.79 | 5.3 × 10− 3 | 0.01 |
| Number of potential sites for missense mutations | 2.57 | 1.0 × 10−2 | 0.42 |
Gene characteristics significant in stepwise best subset multiple linear regression model for nonsense mutations
| Predictor | T-test | Beta (ß) | |
|---|---|---|---|
| Number of potential sites for nonsense mutations | 27.77 | 3.42 × 10−132 | 0.782 |
| Number of silent mutations in the gene | 26.03 | 1.66 × 10− 129 | 0.301 |
| Gene size in nucleotides | −16.22 | 5.10 × 10− 49 | − 0.498 |
| Percentage of “G” | − 3.07 | 4.60 × 10− 4 | − 0.028 |
| Replication time | −2.35 | 7.10 × 10− 3 | − 0.021 |
| Evolutionary conservation | 2.23 | 2.26 × 10− 2 | 0.016 |
Gene characteristics significant in stepwise best subset multiple linear regression model for frameshift mutations
| Predictor | T-test | Beta (ß) | |
|---|---|---|---|
| Gene size in nucleotides | 50.93 | 3.34 × 10− 218 | 0.38 |
| Nucleotide diversity | −10.69 | 5.56 × 10− 26 | − 0.1 |
| Number of silent mutations in the gene | 6.86 | 5.65 × 10− 09 | 0.1 |
| Percentage of “C” | 5.26 | 3.84 × 10−7 | 0.09 |
| Percentage of “A” | 5.16 | 6.52 × 10− 7 | 0.09 |
| Gene expression in CCLE cancer cell lines | 4.21 | 5.64 × 10−5 | 0.03 |
| Percentage of “CpGs” | − 4.1 | 8.84 × 10− 5 | − 0.06 |
| Percentage of “G” | − 3.62 | 5.52 × 10− 4 | − 0.04 |
| “hic” from MutsigCV | 2.25 | 3.14 × 10− 2 | 0.01 |
| Evolutionary conservation | 2.09 | 4.42 × 10− 2 | 0.01 |
Gene characteristics significant in stepwise best subset multiple linear regression model for missense, nonsense, and frameshift mutations analyzed together
| Predictor | T-test | Beta (ß) | |
|---|---|---|---|
| Number of silent mutations in the gene | 121.8 | 1.6x− 789 | 0.72 |
| Gene size in nucleotides | 41.6 | 1.4x− 220 | 0.23 |
| Nucleotide diversity | −8.5 | 3.3x−17 | −0.04 |
| Relative replication time | −7.0 | 1.7x−12 | −0.03 |
| Percentage of “G” | − 7.0 | 2.6x− 12 | − 0.04 |
| Percentage of “C” | − 7.0 | 3.1x− 12 | − 0.05 |
| “reptime” from MutsigCV | 5.4 | 4.1x− 8 | 0.03 |
| Percentage of “CpG” | −3.7 | 1.0x− 4 | − 0.02 |
| Evolutionary conservation | − 3.3 | 4.7x− 4 | − 0.01 |
| Percentage of “A” | 2.6 | 4.8x− 3 | 0.02 |
| “expr” from MutsigCV | 2.5 | 6.4x− 3 | 0.01 |
Fig. 4Z-scores for known tumor suppressor genes (TS), oncogenes (OG) and the genes that are not reported by UniprotKB as TS or OG – other genes. Z-scores for FS, missense (Mis.) and nonsense (Non.) mutations are shown separately. Vertical bars indicate the standard error of mean