| Literature DB >> 29642934 |
Suleyman Vural1, Richard Simon1, Julia Krushkal2.
Abstract
BACKGROUND: The APOBEC gene family of cytidine deaminases plays important roles in DNA repair and mRNA editing. In many cancers, APOBEC3B increases the mutation load, generating clusters of closely spaced, single-strand-specific DNA substitutions with a characteristic hypermutation signature. Some studies also suggested a possible involvement of APOBEC3A, REV1, UNG, and FHIT in molecular processes affecting APOBEC mutagenesis. It is important to understand how mutagenic processes linked to the activity of these genes may affect sensitivity of cancer cells to treatment.Entities:
Keywords: APOBEC mutagenesis; Cell line; Chemosensitivity; Gene expression
Mesh:
Substances:
Year: 2018 PMID: 29642934 PMCID: PMC5896091 DOI: 10.1186/s40246-018-0150-x
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Fig. 1Venn diagram showing the numbers of CCLE cell lines with available data
Expression of the five candidate genes in cell lines from different cancer types
| Cancer type |
|
|
|
|
|
| |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Range | Mean ± SD | Range | Mean ± SD | Range | Mean ± SD | Range | Mean ± SD | Range | Mean ± SD | ||
| ALL | 2 | 3.40–3.75 | 3.58 ± 0.25 | 3.72–9.33 | 6.53 ± 3.97 | 10.04–10.56 | 10.3 ± 0.37 | 7.52–7.56 | 7.54 ± 0.03 | 4.68–7.33 | 6.01 ± 1.87 |
| BLADDER | 27 | 3.49–6.87 | 4.11 ± 0.74 | 3.60–11.51 | 9.59 ± 1.64 | 7.86–10.93 | 9.7 ± 0.81 | 5.86–8.21 | 7.01 ± 0.50 | 4.27–7.10 | 5.14 ± 0.70 |
| BREAST | 59 | 3.11–6.18 | 3.88 ± 0.47 | 3.13–11.28 | 8.78 ± 2.07 | 7.00–11.16 | 9.62 ± 0.82 | 5.89–8.31 | 6.98 ± 0.40 | 4.29–7.26 | 5.66 ± 0.82 |
| CESC | 22 | 3.14–4.81 | 3.79 ± 0.33 | 3.29–10.98 | 8.87 ± 2.15 | 8.28–10.39 | 9.62 ± 0.63 | 6.45–8.1 | 7.09 ± 0.45 | 4.29–7.73 | 5.87 ± 1.1 |
| CLLE | 78 | 3.29–6.63 | 3.87 ± 0.46 | 3.06–11.46 | 8.01 ± 2.39 | 7.37–11.50 | 9.7 ± 0.72 | 6.53–8.18 | 7.4 ± 0.36 | 4.46–10.24 | 6.59 ± 1.51 |
| COAD/READ | 62 | 3.22–4.54 | 3.81 ± 0.33 | 3.02–11.91 | 8.70 ± 2.30 | 7.67–10.97 | 9.75 ± 0.69 | 6.23–7.76 | 7.07 ± 0.29 | 4.30–7.81 | 6.05 ± 0.89 |
| DA | 2 | 3.36–3.37 | 3.36 ± 0.01 | 3.41–3.56 | 3.49 ± 0.11 | 9.36–9.95 | 9.66 ± 0.42 | 6.97–7.19 | 7.08 ± 0.16 | 4.75–4.96 | 4.85 ± 0.15 |
| EC | 26 | 3.34–5.21 | 3.90 ± 0.46 | 3.03–11.81 | 8.80 ± 2.39 | 8.51–10.64 | 9.69 ± 0.55 | 6.19–7.79 | 7.06 ± 0.38 | 4.49–8.34 | 5.29 ± 0.90 |
| GLIOMA | 79 | 3.27–4.44 | 3.76 ± 0.25 | 3.07–11.45 | 8.42 ± 2.71 | 7.82–11.09 | 9.34 ± 0.64 | 6.32–8.03 | 7.00 ± 0.34 | 4.08–7.33 | 5.13 ± 0.63 |
| HNSC | 33 | 3.34–11.29 | 4.93 ± 1.86 | 6.51–11.66 | 9.54 ± 1.27 | 7.82–10.35 | 9.01 ± 0.68 | 6.25–8.18 | 7.24 ± 0.45 | 4.26–6.06 | 4.85 ± 0.34 |
| LAML | 5 | 3.53–4.74 | 3.96 ± 0.47 | 8.14–10.67 | 9.44 ± 1.06 | 8.35–10.43 | 9.67 ± 0.82 | 7.22–7.69 | 7.47 ± 0.21 | 5.80–7.53 | 6.67 ± 0.80 |
| LCML | 1 | 6.20 | 6.20 | 12.56 | 12.56 | 9.79 | 9.79 | 6.54 | 6.54 | 7.69 | 7.69 |
| LIHC | 34 | 3.34–4.64 | 3.82 ± 0.3 | 3.34–12.42 | 8.39 ± 2.62 | 7.82–10.72 | 9.55 ± 0.67 | 6.07–8.02 | 6.88 ± 0.39 | 4.26–7.48 | 5.33 ± 0.74 |
| MATBCL | 60 | 3.36–5.15 | 3.84 ± 0.34 | 3.31–11.76 | 7.09 ± 2.68 | 5.81–10.69 | 9.33 ± 1.1 | 6.45–8.05 | 7.27 ± 0.44 | 4.34–10.67 | 6.35 ± 1.38 |
| MB | 2 | 3.48–3.83 | 3.65 ± 0.25 | 3.48–5.86 | 4.67 ± 1.68 | 7.69–9.75 | 8.72 ± 1.45 | 6.77–6.94 | 6.85 ± 0.12 | 6.00–7.68 | 6.84 ± 1.19 |
| MEL | 59 | 3.45–4.35 | 3.87 ± 0.21 | 3.47–11.81 | 9.81 ± 1.52 | 7.32–10.51 | 9.12 ± 0.62 | 6.41–7.91 | 6.91 ± 0.3 | 4.34–7.67 | 5.55 ± 0.78 |
| MEN | 3 | 3.65–4.05 | 3.85 ± 0.20 | 8.78–9.64 | 9.08 ± 0.48 | 8.69–9.72 | 9.15 ± 0.53 | 6.47–6.93 | 6.76 ± 0.25 | 4.84–5.96 | 5.25 ± 0.62 |
| MESO | 2 | 3.80–3.95 | 3.88 ± 0.11 | 9.92–11.05 | 10.48 ± 0.79 | 9.32–9.58 | 9.45 ± 0.19 | 6.61–6.8 | 6.71 ± 0.14 | 4.18–5.80 | 4.99 ± 1.15 |
| MGCT | 3 | 3.37–3.62 | 3.53 ± 0.14 | 7.19–9.23 | 8.28 ± 1.03 | 7.33–8.09 | 7.77 ± 0.40 | 6.40–6.73 | 6.60 ± 0.17 | 4.73–5.86 | 5.27 ± 0.57 |
| MM | 28 | 3.46–5.61 | 4.12 ± 0.48 | 2.96–12.09 | 9.52 ± 2.54 | 7.10–10.85 | 9.42 ± 0.91 | 5.83–7.31 | 6.68 ± 0.36 | 4.99–8.74 | 6.96 ± 1.00 |
| NSCLC | 186 | 3.06–7.82 | 3.79 ± 0.52 | 3.04–11.92 | 7.98 ± 2.59 | 7.85–11.31 | 9.67 ± 0.64 | 5.98–8.28 | 7.08 ± 0.45 | 4.14–8.11 | 5.43 ± 0.80 |
| OVARIAN | 51 | 3.31–4.46 | 3.72 ± 0.24 | 3.09–10.98 | 8.06 ± 2.3 | 7.32–10.71 | 9.35 ± 0.72 | 6.29–7.99 | 7.00 ± 0.32 | 4.26–8.38 | 5.79 ± 0.97 |
| PAAD | 44 | 3.28–6.13 | 3.89 ± 0.50 | 3.10–11.66 | 8.95 ± 2.31 | 7.50–10.95 | 9.54 ± 0.8 | 6.48–8.42 | 7.19 ± 0.36 | 4.43–7.44 | 5.34 ± 0.72 |
| PNET | 3 | 3.21–3.59 | 3.45 ± 0.21 | 2.94–3.50 | 3.21 ± 0.28 | 9.16–10.09 | 9.54 ± 0.49 | 6.57–7.58 | 7.06 ± 0.51 | 4.43–6.97 | 5.58 ± 1.29 |
| PRAD | 7 | 3.58–4.07 | 3.81 ± 0.19 | 3.33–9.99 | 8.10 ± 2.18 | 9.49–11.20 | 10.15 ± 0.67 | 6.48–7.65 | 6.98 ± 0.43 | 5.04–7.38 | 5.94 ± 1.01 |
| RCC | 36 | 3.23–4.17 | 3.70 ± 0.22 | 3.17–11.25 | 8.87 ± 1.97 | 7.70–9.98 | 9.18 ± 0.51 | 6.52–7.59 | 6.96 ± 0.25 | 4.57–7.05 | 5.7 ± 0.73 |
| SAR | 43 | 3.38–4.29 | 3.74 ± 0.23 | 3.15–11.24 | 8.37 ± 2.37 | 7.57–10.73 | 9.26 ± 0.79 | 6.48–7.95 | 7.03 ± 0.39 | 4.14–6.36 | 4.87 ± 0.49 |
| SCLC | 7 | 3.01–4.11 | 3.67 ± 0.39 | 3.28–11.28 | 7.38 ± 3.16 | 9.54–10.60 | 10.10 ± 0.45 | 6.88–8.02 | 7.49 ± 0.41 | 5.07–6.6 | 5.71 ± 0.55 |
| STAD | 38 | 3.16–4.58 | 3.72 ± 0.27 | 3.21–11.68 | 7.88 ± 2.68 | 8.51–10.38 | 9.53 ± 0.52 | 5.97–7.75 | 7.01 ± 0.44 | 4.35–7.81 | 5.62 ± 0.9 |
| THCA | 13 | 3.37–4.28 | 3.66 ± 0.22 | 3.87–10.97 | 8.57 ± 2.14 | 8.07–10.31 | 9.21 ± 0.62 | 6.32–7.67 | 6.94 ± 0.36 | 4.34–7.63 | 5.49 ± 0.98 |
| UCEC | 6 | 3.40–4.05 | 3.66 ± 0.26 | 3.81–10.99 | 8.84 ± 2.63 | 9.36–10.14 | 9.82 ± 0.32 | 6.18–7.28 | 6.78 ± 0.45 | 4.53–8.11 | 5.82 ± 1.32 |
| MISC | 15 | 3.36–6.68 | 4.01 ± 0.80 | 4.50–11.28 | 8.83 ± 1.80 | 7.18–10.42 | 9.11 ± 1.07 | 6.47–8.38 | 7.28 ± 0.53 | 4.17–6.90 | 5.26 ± 0.74 |
|
| 1036 | 3.23–8.48 | 3.89 ± 0.61 | 3.02–12.42 | 8.43 ± 2.43 | 7.00–11.50 | 9.41 ± 0.78 | 5.83–8.18 | 7.05 ± 0.42 | 4.14–10.67 | 5.74 ± 1.16 |
n number of cell lines for each cancer type with available Affymetrix U133 2.0 plus microarray expression data, SD standard deviation, ALL acute lymphocytic leukemia, BLADDER bladder cancer, BREAST breast cancer, CESC cervical squamous cell carcinoma and endocervical adenocarcinoma, CLLE chronic lymphocytic leukemia, COAD/READ colon adenocarcinoma and rectum adenocarcinoma, DA duodenal adenocarcinoma, EC esophageal cancer, GLIOMA glioma brain tumors, HNSC head and neck squamous cell carcinoma, LAML acute myeloid leukemia, LCML chronic myelogenous leukemia, LIHC liver hepatocellular carcinoma, MATBCL mature B cell lymphoma, MB medulloblastoma, MEL melanoma, MEN meningioma, MESO mesothelioma, MGCT malignant giant cell tumor of bone, MM multiple myeloma, NSCLC non-small cell lung cancer, OVARIAN ovarian cancer, PAAD pancreatic adenocarcinoma, PNET primitive neuroectodermal tumors, PRAD prostate adenocarcinoma, RCC renal cell carcinoma, SAR sarcoma, SCLC small cell lung cancer, STAD stomach adenocarcinoma, THCA thyroid carcinoma, UCEC uterine corpus endometrial carcinoma, MISC other miscellaneous categories of cancer including rare cancers or cancers with unspecified information, Pan-cancer combined analysis of all cancer categories
Fig. 2a–e Histograms and density functions showing the distributions of expression of the five candidate genes in the cell lines. a APOBEC3A. b APOBEC3B. c REV1. d UNG. e FHIT. Horizontal scale represents log2-transformed gene expression values. The left vertical scale represents cell line counts, whereas the right vertical scale represents density values. f A scatterplot of APOBEC3B vs APOBEC3A expression in 1012 cell lines from the CCLE microarray expression dataset which shows the copy number status of the APOBEC3B gene according to the CCLE data [33]. Cell lines with log2(normalized ratio of APOBEC3B copy number estimate) ≥ − 0.75 are shown in blue, whereas those with log2(normalized ratio of APOBEC3B copy number estimate) < − 0.75 are shown in red
Significant correlations among candidate gene expression levels
| Gene 1 | Gene 2 |
|
|
|
| Cancer category |
|---|---|---|---|---|---|---|
| Within individual cancer categories | ||||||
|
|
| 186 | 0.741 | 1.15 × 10−33 | 2.64 × 10−31 | NSCLC |
|
|
| 5 | 1.000 | 1.40 × 10−24 | 1.61 × 10−22 | LAML |
|
|
| 62 | 0.759 | 8.88 × 10−13 | 6.81 × 10−11 | COAD/READ |
|
|
| 78 | 0.690 | 2.83 × 10−12 | 1.44 × 10−10 | CLLE |
|
|
| 79 | 0.686 | 3.12 × 10−12 | 1.44 × 10−10 | GLIOMA |
|
|
| 38 | 0.811 | 6.60 × 10−10 | 2.53 × 10−8 | STAD |
|
|
| 51 | 0.712 | 4.82 × 10−9 | 1.58 × 10−7 | OVARIAN |
|
|
| 44 | 0.746 | 6.12 × 10−9 | 1.76 × 10−7 | PAAD |
|
|
| 60 | 0.651 | 1.73 × 10−8 | 4.43 × 10−7 | MATBCL |
|
|
| 59 | 0.612 | 2.65 × 10−7 | 6.09 × 10−6 | BREAST |
|
|
| 26 | 0.805 | 7.04 × 10−7 | 1.47 × 10−5 | EC |
|
|
| 186 | 0.344 | 1.56 × 10−6 | 2.98 × 10−5 | NSCLC |
|
|
| 59 | 0.576 | 1.81 × 10−6 | 3.20 × 10−5 | MEL |
|
|
| 27 | 0.773 | 2.30 × 10−6 | 3.78 × 10−5 | BLADDER |
|
|
| 43 | 0.639 | 3.95 × 10−6 | 6.06 × 10−5 | SAR |
|
|
| 36 | 0.637 | 3.00 × 10−5 | 0.0004 | RCC |
|
|
| 34 | 0.645 | 3.83 × 10−5 | 0.0005 | LIHC |
|
|
| 22 | 0.747 | 6.48 × 10−5 | 0.0008 | CESC |
|
|
| 33 | 0.636 | 6.87 × 10−5 | 0.0008 | HNSC |
|
|
| 79 | − 0.407 | 0.0002 | 0.0022 | GLIOMA |
|
|
| 28 | 0.632 | 0.0003 | 0.0034 | MM |
|
|
| 13 | 0.769 | 0.0021 | 0.0221 | THCA |
|
|
| 60 | − 0.372 | 0.0034 | 0.0342 | MATBCL |
|
|
| 78 | − 0.324 | 0.0039 | 0.0369 | CLLE |
|
|
| 6 | 0.943 | 0.0048 | 0.0442 | UCEC |
| Across all cancer categories | ||||||
|
|
| 1036 | 0.714 | 1.91 × 10−162 | 1.91 × 10−161 | Pan-cancer |
|
|
| 1036 | 0.189 | 8.04 × 10−10 | 4.02 × 10−9 | Pan-cancer |
|
|
| 1036 | − 0.118 | 0.0001 | 0.0005 | Pan-cancer |
|
|
| 1036 | − 0.088 | 0.0046 | 0.0115 | Pan-cancer |
|
|
| 1036 | − 0.070 | 0.0251 | 0.0426 | Pan-cancer |
|
|
| 1036 | − 0.068 | 0.0291 | 0.0426 | Pan-cancer |
|
|
| 1036 | 0.068 | 0.0298 | 0.0426 | Pan-cancer |
Listed are significant correlations with padj < 0.05. The p values were adjusted for false discovery rate accounting for five genes (Ntests = 10). Among individual cancer categories, FDR adjustment also accounted for 23 cancer categories with ≥ 5 cell lines with available expression data in both genes (Ntests = 230). Abbreviations of cancer categories are provided in the legend of Table 1
n sample size for correlation analysis, ρ Spearman correlation coefficient, p p value prior to FDR adjustment, p FDR-adjusted p value
Prevalence of mutation counts in the whole-exome sequencing data
| C>G | C>T | C>K | All SNV counts | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Cancer type |
| Range | Mean ± SD | Range | Mean ± SD | Range | Mean ± SD | Range | Mean ± SD |
| BLADDER | 6 | 5297–7790 | 6564 ± 968 | 19,570–28,590 | 24,077 ± 3408 | 24,867–36,380 | 30,640 ± 4373 | 54,376–78,947 | 66,878 ± 9294 |
| BREAST | 14 | 6141–8460 | 6860 ± 566 | 22,789–33,262 | 25,705 ± 2548 | 28,930–41,722 | 32,564 ± 3102 | 63,719–89,515 | 71,153 ± 6292 |
| CESC | 15 | 6309–7994 | 7095 ± 530 | 23,581–34,126 | 28,889 ± 3587 | 29,890–41,452 | 35,984 ± 3926 | 65,633–101,318 | 79,411 ± 9753 |
| COAD/READ | 16 | 5332–7595 | 6657 ± 664 | 20,925–36,866 | 25,962 ± 3970 | 26,257–44,461 | 32,620 ± 4555 | 59,552–89,617 | 70,796 ± 8093 |
| EC | 3 | 6703–7357 | 6954 ± 353 | 24,685–27,760 | 25,752 ± 1740 | 31,486–35,117 | 32,706 ± 2088 | 68,907–76,008 | 71,472 ± 3940 |
| GLIOMA | 18 | 5833–7682 | 6713 ± 458 | 21,183–28,151 | 24,924 ± 1644 | 27,016–35,833 | 31,637 ± 2096 | 60,001–78,102 | 69,420 ± 4414 |
| HNSC | 18 | 5195–7378 | 6714 ± 467 | 20,073–27,050 | 25,054 ± 1618 | 25,268–34,428 | 31,768 ± 2074 | 55,628–75,813 | 69,801 ± 4531 |
| CLLE | 42 | 4235–8400 | 6974 ± 723 | 17,410–32,021 | 26,545 ± 2888 | 21,645–40,010 | 33,520 ± 3549 | 47,685–86,517 | 72,972 ± 7267 |
| LIHC | 17 | 5864–8444 | 7007 ± 497 | 22,051–30,565 | 25,793 ± 1781 | 27,915–39,009 | 32,800 ± 2266 | 61,208–85,224 | 72,102 ± 4850 |
| MATBCL | 29 | 6350–8912 | 7209 ± 593 | 23,674–33,141 | 27,029 ± 2170 | 30,125–42,053 | 34,239 ± 2751 | 66,014–91,667 | 74,874 ± 5935 |
| MEL | 17 | 5722–8448 | 6759 ± 631 | 22,174–31,819 | 25,874 ± 2324 | 27,896–40,267 | 32,633 ± 2945 | 60,650–87,815 | 70,805 ± 6434 |
| MESO | 1 | 6112 | 6112 | 21,790 | 21,790 | 27,902 | 27,902 | 62,016 | 62,016 |
| MM | 17 | 6187–8662 | 6840 ± 628 | 22,773–32,455 | 25,456 ± 2338 | 28,960–41,117 | 32,296 ± 2961 | 63,335–88,898 | 70,785 ± 6192 |
| NSCLC | 36 | 5509–8739 | 6927 ± 768 | 20,710–32,767 | 25,641 ± 2666 | 26,219–41,506 | 32,567 ± 3424 | 57,506–90,159 | 71,563 ± 7520 |
| OVARIAN | 15 | 5951–7461 | 6682 ± 503 | 22,453–27,222 | 25,077 ± 1500 | 28,433–34,683 | 31,760 ± 1988 | 62,699–75,986 | 69,753 ± 4383 |
| PAAD | 16 | 5011–7432 | 6640 ± 588 | 19,327–27,658 | 24,801 ± 2144 | 24,338–35,090 | 31,441 ± 2725 | 53,010–76,653 | 68,905 ± 5941 |
| PRAD | 4 | 5699–6889 | 6423 ± 512 | 20,538–28,059 | 25,092 ± 3450 | 26,237–34,948 | 31,515 ± 3947 | 57,717–74,831 | 68,722 ± 8018 |
| RCC | 8 | 6521–7566 | 6980 ± 411 | 24,508–27,801 | 26,133 ± 1383 | 31,082–35,264 | 33,114 ± 1783 | 68,091–77,777 | 72,638 ± 4113 |
| SAR | 12 | 6336–7808 | 6968 ± 423 | 23,647–29,155 | 26,129 ± 1610 | 29,983–36,963 | 33,098 ± 2027 | 65,833–81,175 | 72,342 ± 4357 |
| STAD | 16 | 5861–7530 | 6807 ± 448 | 21,971–28,460 | 25,305 ± 1763 | 27,832–35,741 | 32,112 ± 2199 | 61,311–79,632 | 70,672 ± 4843 |
| THCA | 3 | 5811–6918 | 6463 ± 579 | 22,080–25,849 | 24,363 ± 2007 | 27,891–32,767 | 30,826 ± 2586 | 61,598–71,836 | 67,720 ± 5406 |
| UCEC | 2 | 6063–6489 | 6276 ± 301 | 24,128–24,223 | 24,176 ± 67 | 30,286–30,617 | 30,452 ± 234 | 66,406–67,542 | 66,974 ± 803 |
| Pan-cancer | 325 | 4235–8912 | 6865 ± 618 | 17,410–36,866 | 25,867 ± 2575 | 21,645–44,461 | 32,732 ± 3139 | 47,685–101,318 | 71,661 ± 6693 |
Shown are counts of C>T, C>G, and C>K substitutions on both genome strands, and of any types of SNV variants representing nucleotide substitutions
K G or T, SD standard deviation, SNV single nucleotide variant, n number of cell lines
Fig. 3a–c Overall motif counts in different cancer types and across all cell lines (pan-cancer analysis). The y axis is presented on the log10 scale. a T(C>K)W motif counts. b T(C>D)R motif counts. c T(C>D)D motif counts. d–f Numbers of distinct, not overlapping 5/1000 kataegis clusters with ≥ 5 motifs on the same genome strand per 1000 bp in different cancer types and in the pan-cancer dataset. d T(C>K)W motif counts. e T(C>D)R motif counts. f T(C>D)D motif counts. Horizontal middle bars show the mean for each cancer category. Vertical bars show mean ± standard deviation. Negative values of (mean − standard deviation) in d and e were truncated at 0. Cancer categories with no vertical columns had no predicted kataegis clusters (d–f) and/or too few cell lines to compute the standard deviation (n = 2 for mesothelioma, a–c)
Prevalence of APOBEC mutation motifs and kataegis clusters in a combined analysis of all cancer categories
| Measure | T(C>K)W | T(C>D)R | T(C>D)D | |||
|---|---|---|---|---|---|---|
| Range | Mean ± SD | Range | Mean ± SD | Range | Mean ± SD | |
| Total motif count | 381–1369 | 603.58 ± 121.17 | 465–4633 | 743.51 ± 317.68 | 715–13,461 | 1184.94 ± 887.46 |
| Predicted non-overlapping kataegis clusters, 5/1000 | ||||||
| Number of motifs in distinct clusters | 0–16 | 0.6 ± 1.87 | 0–21 | 2.9 ± 3.99 | 0–69 | 10.71 ± 6.8 |
| Number of distinct clusters | 0–3 | 0.12 ± 0.36 | 0–4 | 0.56 ± 0.77 | 0–11 | 2 ± 1.2 |
| Combined length (bp) of distinct clusters | 0–1994 | 76.95 ± 238.6 | 0–3148 | 418.47 ± 615.59 | 0–7484 | 1327.31 ± 894.04 |
| Predicted non-overlapping kataegis clusters, 6/10000 | ||||||
| Number of motifs in distinct non-overlapping clusters | 0–95 | 0.87 ± 6.08 | 0–93 | 3.93 ± 8.34 | 0–221 | 10.26 ± 16.69 |
| Number of distinct non-overlapping clusters | 0–10 | 0.11 ± 0.67 | 0–8 | 0.53 ± 0.86 | 0–18 | 1.45 ± 1.65 |
| Combined length (bp) of distinct clusters | 0–89,163 | 750.11 ± 5802.03 | 0–78,974 | 2997.94 ± 7701 | 0–147,285 | 5323.27 ± 12,925.39 |
Shown are values per cell line, computed using whole-exome sequence data of each cell line
SD standard deviation, 5/1000 a kataegis cluster with ≥ 5 motifs on the same genome strand per 1000 bp, 6/10000 a kataegis cluster with ≥ 6 motifs on the same genome strand per 10,000 bp
Statistically significant correlations of gene expression levels with mutation counts
| Gene | Mutation count |
|
|
|
| Cancer type |
|---|---|---|---|---|---|---|
|
| C>Kb | 12 | − 0.902 | 6.00 × 10−5 | 0.0114 | Sarcoma |
|
| C>Ka | 12 | − 0.895 | 8.37 × 10−5 | 0.0114 | Sarcoma |
|
| C>Tb | 12 | − 0.895 | 8.37 × 10−5 | 0.0114 | Sarcoma |
|
| Any | 12 | − 0.881 | 0.0002 | 0.0114 | Sarcoma |
|
| C>Ta | 12 | − 0.881 | 0.0002 | 0.0114 | Sarcoma |
|
| C>Ga | 12 | − 0.867 | 0.0003 | 0.0119 | Sarcoma |
|
| C>Gb | 12 | − 0.867 | 0.0003 | 0.0119 | Sarcoma |
|
| C>Ka | 17 | − 0.816 | 6.45 × 10−5 | 0.0114 | Melanoma |
|
| Any | 17 | − 0.799 | 0.0001 | 0.0114 | Melanoma |
|
| C>Kb | 17 | − 0.797 | 0.0001 | 0.0114 | Melanoma |
|
| C>Ga | 17 | − 0.787 | 0.0002 | 0.0118 | Melanoma |
|
| C>Ta | 17 | − 0.779 | 0.0002 | 0.0119 | Melanoma |
|
| C>Tb | 17 | − 0.777 | 0.0002 | 0.0119 | Melanoma |
|
| C>Gb | 17 | − 0.738 | 0.0007 | 0.0308 | Melanoma |
Shown are correlations of gene expression levels with overall mutation counts in the WES data with padj < 0.05. These p values were FDR adjusted for multiple comparisons that included 5 candidate genes, 17 cancer categories with ≥ 5 cell lines in each category having both WES and expression data, and 7 categories of mutation counts including C>T, C>G, and C>K on one or both genome strands, as well as overall single nucleotide variant counts (Ntests = 595). “>” indicates the direction of substitution change
Any all types of nucleotide substitutions, K G or T, n sample size for correlation analysis, ρ Spearman correlation coefficient, p p value prior to FDR adjustment, p FDR-adjusted p value
aMutation counts on the reference genome strand only
bMutation counts on both genome strands
Strongest significant correlations between candidate gene expression and drug sensitivity
| Cancer category | Gene | Agent |
|
|
|
| Drug action/alternative name | Reference |
|---|---|---|---|---|---|---|---|---|
| PAAD |
| JQ1a | 28 | − 0.819 | 9.70 × 10−8 | 0.0001 | BET inhibitor | [ |
| PRAD |
| PD-0332991a | 5 | − 1.000 | 1.40 × 10−24 | 1.75 × 10−21 | Palbociclib; CDK 4/6 inhibitor | [ |
| PRAD |
| GDC0941a | 5 | − 1.000 | 1.40 × 10−24 | 1.75 × 10−21 | Pictilisib; pan-class I PI3K inhibitor | [ |
| PRAD |
| KIN001-260a | 5 | − 1.000 | 1.40 × 10−24 | 1.75 × 10−21 | IKKb inhibitor | [ |
| PRAD |
| EHT 1864a | 5 | − 1.000 | 1.40 × 10−24 | 1.75 × 10−21 | Rac inhibitor | [ |
| PRAD |
| Nutlin-3aa | 5 | − 1.000 | 1.40 × 10−24 | 1.75 × 10−21 | Inhibitor of MDM2-p53 interaction | [ |
| CESC |
| ZM-447439a | 5 | − 1.000 | 1.40 × 10−24 | 1.75 × 10−21 | Aurora kinase inhibitor | [ |
| MM |
| QL-VIII-58a | 5 | − 1.000 | 1.40 × 10−24 | 1.75 × 10−21 | Inhibitor of mTOR and ATR signaling | [ |
| MM |
| ZG-10a | 5 | − 1.000 | 1.40 × 10−24 | 1.75 × 10−21 | Inhibitor of JNK1 and p38 signaling | [ |
| SAR |
| TGX221a | 6 | − 1.000 | < 4.95 × 10−324 | < 4.95 × 10−324 | PI3Kβ inhibitor | [ |
| CESC |
| MLN4924a | 5 | − 1.000 | 1.40 × 10−24 | 1.75 × 10−21 | Pevodenistat; NAE inhibitor | [ |
| RCC |
| XMD8-92a | 6 | − 1.000 | < 4.95 × 10−324 | < 4.95 × 10−324 | BMK1/ERK5 inhibitor | [ |
| NSCLC |
| RDEA119a | 123 | 0.381 | 1.35 × 10−5 | 0.0153 | Refametinib; BAY 86-9766; MEK inhibitor | [ |
| NSCLC |
| PD-0325901a | 106 | 0.405 | 1.64 × 10−5 | 0.0179 | MEK inhibitor | [ |
| NSCLC |
| AKT inhibitor VIIIa | 121 | 0.373 | 2.51 × 10−5 | 0.0262 | AKT inhibitor | [ |
| NSCLC |
| Embelina | 121 | 0.366 | 3.61 × 10−5 | 0.0349 | XIAP inhibitor | [ |
| NSCLC |
| Trametiniba | 121 | 0.361 | 4.71 × 10−5 | 0.0436 | MEK inhibitor | [ |
| NSCLC |
| AZD6482a | 130 | 0.348 | 4.84 × 10−5 | 0.0436 | PI3Kβ inhibitor | [ |
| NSCLC |
| PD-0332991a | 100 | 0.392 | 5.41 × 10−5 | 0.0471 | Palbociclib; CDK 4/6 inhibitor | [ |
| PRAD |
| NSC-207895a | 5 | 1.000 | 1.40 × 10−24 | 1.75 × 10−21 | MDMX inhibitor | [ |
| PRAD |
| Piperlonguminea | 5 | 1.000 | 1.40 × 10−24 | 1.75 × 10−21 | Piplartine; ROS induction | [ |
| PRAD |
| ZM-447439a | 5 | 1.000 | 1.40 × 10−24 | 1.75 × 10−21 | Aurora kinase inhibitor | [ |
| PRAD |
| NU-7441a | 5 | 1.000 | 1.40 × 10−24 | 1.75 × 10−21 | DNA-PK inhibitor | [ |
| PRAD |
| CCT007093a | 5 | − 1.000 | 1.40 × 10−24 | 1.75 × 10−21 | PPM1D inhibitor | [ |
| PRAD |
| JQ1a | 5 | − 1.000 | 1.40 × 10−24 | 1.75 × 10−21 | BET inhibitor | [ |
| PRAD |
| NVP-BHG712a | 5 | − 1.000 | 1.40 × 10− 24 | 1.75 × 10−21 | EphB4 inhibitor | [ |
| CESC |
| MK-2206a | 5 | − 1.000 | 1.40 × 10−24 | 1.75 × 10−21 | AKT inhibitor | [ |
| MEL |
| TAE684b | 38 | 0.621 | 3.24 × 10−5 | 0.0325 | ALK inhibitor | [ |
| SCLC |
| ABT-869a | 6 | − 1.000 | < 4.95 × 10−324 | < 4.95 × 10−324 | Linifanib; VEGFR/PDGFR family receptor inhibitor | [ |
| SCLC |
| Mitomycin Ca | 6 | − 1.000 | < 4.95 × 10−324 | < 4.95 × 10−324 | DNA cross-linking/monoalkylating agent | [ |
| Pan-cancer |
| 17-AAGa | 536 | − 0.293 | 4.25 × 10−12 | 5.85 × 10−9 | HSP90 inhibitor | [ |
Shown are statistically significant correlations satisfying |ρ| > 0.25, padj < 0.05. The p values were adjusted for false discovery rate accounting for 5 genes and 255 agents with 275 drug sensitivity measures from CCLE or GDSC resources (Ntests = 1375 for pan-cancer analysis). Among individual cancer categories, FDR adjustment also accounted for 26 cancer categories with ≥ 5 available cell lines in each category with both gene expression and drug sensitivity data for correlation analysis (Ntests = 26,110). Abbreviations of cancer categories are provided in the legend of Table 1
n sample size for correlation analysis, ρ Spearman correlation coefficient, p p value prior to FDR adjustment, p FDR-adjusted p value, BET bromodomain and extraterminal family of proteins, BRAF v-raf murine sarcoma viral oncogene homolog B, CDK cyclin-dependent kinase, DNA-PK DNA-dependent protein kinase, HDAC histone deacetylase, HSP90 molecular chaperone heat shock protein 90, MEK mitogen-activated protein kinase kinases, NAE NEDD8-activating enzyme E1, PI3K phosphatidylinositol-3-kinase, ROS reactive oxygen species, XIAP X-linked inhibitor of apoptosis
aDrug sensitivity data from GDSC [30, 35]
bDrug sensitivity data from Cancer Cell Line Encyclopedia (CCLE) [33]
Fig. 4Scatterplots of drug sensitivity measures from the GDSC dataset in selected cancer types. a log(IC50) of JQ1 vs log2 of the APOBEC3A gene expression in pancreatic adenocarcinoma cell lines. b log(IC50) of bicalutamide vs the combined length of predicted 5/1000 kataegis clusters with the T(C>D)D motif in breast cancer cell lines. The names of individual breast cancer cell lines are shown. r Pearson’s correlation coefficient
Significant correlations between the measures of prevalence of APOBEC-like motifs or kataegis clusters and drug sensitivity
| Motif | Measure | Agent |
|
|
|
| Cancer type |
|---|---|---|---|---|---|---|---|
| T(C>K)W | Total number of motifs | WZ3105 | 5 | 1.000 | 1.40 × 10−24 | 5.15 × 10−22 | OVARIAN |
| T(C>K)W | Total number of motifs | XMD15-27 | 5 | −1.000 | 1.40 × 10−24 | 5.15 × 10−22 | OVARIAN |
| T(C>K)W | Total number of motifs | Tipifarnib | 5 | 1.000 | 1.40 × 10−24 | 5.15 × 10−22 | PAAD |
| T(C>K)W | Total number of motifs | AKT inhibitor VIII | 5 | −1.000 | 1.40 × 10−24 | 5.15 × 10−22 | PAAD |
| T(C>K)W | Total number of motifs | GSK-1904529A | 5 | 1.000 | 1.40 × 10−24 | 5.15 × 10−22 | PAAD |
| T(C>D)R | Total number of motifs | rTRAIL | 6 | −1.000 | < 4.95 × 10−324 | < 4.95 × 10−324 | OVARIAN |
| T(C>D)R | Total number of motifs | WZ3105 | 5 | 1.000 | 1.40 × 10−24 | 3.22 × 10−22 | OVARIAN |
| T(C>D)R | Total number of motifs | XMD15-27 | 5 | −1.000 | 1.40 × 10−24 | 3.22 × 10−22 | OVARIAN |
| T(C>D)R | Total number of motifs | KIN001-266 | 5 | −1.000 | 1.40 × 10−24 | 3.22 × 10−22 | COAD/READ |
| T(C>D)R | Total number of motifs | BMS-536924 | 5 | −1.000 | 1.40 × 10−24 | 3.22 × 10−22 | GLIOMA |
| T(C>D)R | Total number of motifs | HG-5-113-01 | 5 | 1.000 | 1.40 × 10−24 | 3.22 × 10−22 | BREAST |
| T(C>D)R | Total number of motifs | Vismodegib | 5 | −1.000 | 1.40 × 10−24 | 3.22 × 10−22 | PAAD |
| T(C>D)R | Total number of motifs | FH535 | 5 | −1.000 | 1.40 × 10−24 | 3.22 × 10−22 | PAAD |
| T(C>D)D | Total number of motifs | rTRAIL | 6 | −1.000 | < 4.95 × 10−324 | < 4.95 × 10−324 | OVARIAN |
| T(C>D)D | Total number of motifs | WZ3105 | 5 | 1.000 | 1.40 × 10−24 | 3.22 × 10−22 | OVARIAN |
| T(C>D)D | Total number of motifs | XMD15-27 | 5 | −1.000 | 1.40 × 10−24 | 3.22 × 10−22 | OVARIAN |
| T(C>D)D | Total number of motifs | NVP-BEZ235 | 5 | −1.000 | 1.40 × 10−24 | 3.22 × 10−22 | COAD/READ |
| T(C>D)D | Total number of motifs | T0901317 | 5 | −1.000 | 1.40 × 10−24 | 3.22 × 10−22 | COAD/READ |
| T(C>D)D | Total number of motifs | RDEA119 | 5 | −1.000 | 1.40 × 10−24 | 3.22 × 10−22 | COAD/READ |
| T(C>D)D | Total number of motifs | HG-5-113-01 | 5 | 1.000 | 1.40 × 10−24 | 3.22 × 10−22 | BREAST |
| T(C>D)D | Total number of motifs | LY317615 | 5 | 1.000 | 1.40 × 10−24 | 3.22 × 10−22 | PAAD |
| T(C>D)D | Length of kataegis regions | PF-4708671 | 6 | 1.000 | < 4.95 × 10−324 | < 4.95 × 10−324 | BREAST |
| T(C>D)D | Length of kataegis regions | EX-527 | 5 | −1.000 | 1.40 × 10−24 | 2.15 × 10−22 | COAD/READ |
| T(C>D)D | Length of kataegis regions | KIN001-236 | 5 | −1.000 | 1.40 × 10−24 | 2.15 × 10−22 | COAD/READ |
| T(C>D)D | Length of kataegis regions | CAL-101 | 5 | −1.000 | 1.40 × 10−24 | 2.15 × 10−22 | COAD/READ |
| T(C>D)D | Length of kataegis regions | Y-39983 | 5 | −1.000 | 1.40 × 10−24 | 2.15 × 10−22 | COAD/READ |
| T(C>D)D | Length of kataegis regions | KIN001-270 | 5 | −1.000 | 1.40 × 10−24 | 2.15 × 10−22 | COAD/READ |
| T(C>D)D | Length of kataegis regions | Ruxolitinib | 5 | −1.000 | 1.40 × 10−24 | 2.15 × 10−22 | COAD/READ |
| T(C>D)D | Length of kataegis regions | XMD14-99 | 5 | −1.000 | 1.40 × 10−24 | 2.15 × 10−22 | COAD/READ |
| T(C>D)D | Length of kataegis regions | QL-VIII-58 | 5 | 1.000 | 1.40 × 10−24 | 2.15 × 10−22 | BREAST |
| T(C>D)D | Length of kataegis regions | Genentech Cpd 10 | 5 | 1.000 | 1.40 × 10−24 | 2.15 × 10−22 | PAAD |
| T(C>D)D | Length of kataegis regions | Gemcitabine | 5 | 1.000 | 1.40 × 10−24 | 2.15 × 10−22 | PAAD |
| T(C>D)D | Length of kataegis regions | Bicalutamide | 5 | 1.000 | 1.40 × 10−24 | 2.15 × 10−22 | PAAD |
| T(C>D)D | Length of kataegis regions | Bicalutamide | 7 | 0.991 | 1.46 × 10−5 | 0.0021 | BREAST |
Shown are statistically significant correlations satisfying padj < 0.05. The p values were adjusted for false discovery rate accounting for 4 measures of abundance of each motif category, 255 agents with 275 drug sensitivity measures, and 26 cancer categories with ≥ 5 available cell lines (Ntests between 1358 and 1874). Drug sensitivity data for all significant correlations listed in the table were obtained from GDSC [30, 35]. Abbreviations of cancer categories are provided in the legend to Table 1
n sample size for correlation analysis, ρ Spearman correlation coefficient, p p value prior to FDR adjustment, p FDR-adjusted p value