| Literature DB >> 27900322 |
Evan A Clayton1, Lu Wang2, Lavanya Rishishwar3, Jianrong Wang4, John F McDonald1, I King Jordan3.
Abstract
Human transposable element (TE) activity in somatic tissues causes mutations that can contribute to tumorigenesis. Indeed, TE insertion mutations have been implicated in the etiology of a number of different cancer types. Nevertheless, the full extent of somatic TE activity, along with its relationship to tumorigenesis, have yet to be fully explored. Recent developments in bioinformatics software make it possible to analyze TE expression levels and TE insertional activity directly from transcriptome (RNA-seq) and whole genome (DNA-seq) next-generation sequence data. We applied these new sequence analysis techniques to matched normal and primary tumor patient samples from the Cancer Genome Atlas (TCGA) in order to analyze the patterns of TE expression and insertion for three cancer types: breast invasive carcinoma, head and neck squamous cell carcinoma, and lung adenocarcinoma. Our analysis focused on the three most abundant families of active human TEs: Alu, SVA, and L1. We found evidence for high levels of somatic TE activity for these three families in normal and cancer samples across diverse tissue types. Abundant transcripts for all three TE families were detected in both normal and cancer tissues along with an average of ~80 unique TE insertions per individual patient/tissue. We observed an increase in L1 transcript expression and L1 insertional activity in primary tumor samples for all three cancer types. Tumor-specific TE insertions are enriched for private mutations, consistent with a potentially causal role in tumorigenesis. We used genome feature analysis to investigate two specific cases of putative cancer-causing TE mutations in further detail. An Alu insertion in an upstream enhancer of the CBL tumor suppressor gene is associated with down-regulation of the gene in a single breast cancer patient, and an L1 insertion in the first exon of the BAALC gene also disrupts its expression in head and neck squamous cell carcinoma. Our results are consistent with widespread somatic activity of human TEs leading to numerous insertion mutations that can contribute to tumorigenesis in a variety of tissues.Entities:
Keywords: Alu; L1; LINE-1; SVA; bioinformatics; mutation; retrotransposons; tumorigenesis
Year: 2016 PMID: 27900322 PMCID: PMC5110550 DOI: 10.3389/fmolb.2016.00076
Source DB: PubMed Journal: Front Mol Biosci ISSN: 2296-889X
Figure 1Scheme of the analytical design used in this study. Matched normal and primary tumor samples for three cancer types were analyzed using transcriptome (RNA-seq) and whole genome (DNA-seq) data. RNA-seq data was analyzed to compare normal versus cancer expression levels, and DNA-seq data was analyzed to identify somatic TE insertion events. The main bioinformatics programs (wrench) and databases (cylinder) used for each phase of the analysis are indicated.
TCGA whole genome (DNA-seq) and transcriptome (RNA-seq) data sources for the patients analyzed in this study.
| Breast 1 | TCGA-BH-A0B3-11B-21D-A128-09 | Breast invasive carcinoma | F | 53 | NT-W | 42.4 | 100 |
| TCGA-BH-A0B3-11B-21R-A089-07 | NT-R | 5.5 | 50 | ||||
| TCGA-BH-A0B3-01A-11D-A128-09 | TP-W | 40.2 | 100 | ||||
| TCGA-BH-A0B3-01B-21R-A089-07 | TP-R | 5.4 | 50 | ||||
| Breast 2 | TCGA-BH-A0BW-11A-12D-A314-09 | F | 71 | NT-W | 54.1 | 100 | |
| TCGA-BH-A0BW-11A-12R-A115-07 | NT-R | 7 | 50 | ||||
| TCGA-BH-A0BW-01A-11D-A10Y-09 | TP-W | 46.1 | 100 | ||||
| TCGA-BH-A0BW-01A-12R-A115-07 | TP-R | 7.3 | 50 | ||||
| Breast 3 | TCGA-BH-A0DT-11A-12D-A12B-09 | F | 41 | NT-W | 63.3 | 100 | |
| TCGA-BH-A0DT-11A-12R-A12D-07 | NT-R | 7.7 | 50 | ||||
| TCGA-BH-A0DT-01A-21D-A12B-09 | TP-W | 79.9 | 100 | ||||
| TCGA-BH-A0DT-01A-21R-A12D-07 | TP-R | 6.6 | 50 | ||||
| Head 1 | TCGA-CV-7255-11A-01D-2276-10 | Head and neck squamous cell carcinoma | F | 32 | NT-W | 6.9 | 101 |
| TCGA-CV-7255-11A-01R-2016-07 | NT-R | 7.5 | 48 | ||||
| TCGA-CV-7255-01A-11D-2276-10 | TP-W | 5.8 | 101 | ||||
| TCGA-CV-7255-01A-11R-2016-07 | TP-R | 7.1 | 48 | ||||
| Head 2 | TCGA-CV-7416-11A-01D-2334-08 | F | 29 | NT-W | 7.7 | 101 | |
| TCGA-CV-7416-11A-01R-2081-07 | NT-R | 5.9 | 48 | ||||
| TCGA-CV-7416-01A-11D-2334-08 | TP-W | 28.6 | 101 | ||||
| TCGA-CV-7416-01A-11R-2081-07 | TP-R | 6 | 48 | ||||
| Head 3 | TCGA-CV-6959-11A-01D-1911-02 | M | 48 | NT-W | 38.3 | 51 | |
| TCGA-CV-6959-11A-01R-1915-07 | NT-R | 8.5 | 48 | ||||
| TCGA-CV-6959-01A-11D-1911-02 | TP-W | 31.4 | 51 | ||||
| TCGA-CV-6959-01A-11R-1915-07 | TP-R | 6.6 | 48 | ||||
| Lung 1 | TCGA-44-6776-11A-01D-1853-02 | Lung adenocarcinoma | F | 60 | NT-W | 38.9 | 51 |
| TCGA-44-6776-11A-01R-1858-07 | NT-R | 5.4 | 48 | ||||
| TCGA-44-6776-01A-11D-1853-02 | TP-W | 6.9 | 51 | ||||
| TCGA-44-6776-01A-11R-1858-07 | TP-R | 7.4 | 48 | ||||
| Lung 2 | TCGA-50-5932-11A-01D-1753-08 | M | 75 | NT-W | 34.6 | 101 | |
| TCGA-50-5932-11A-01R-1755-07 | NT-R | 4.2 | 48 | ||||
| TCGA-50-5932-01A-11D-1753-08 | TP-W | 44.5 | 101 | ||||
| TCGA-50-5932-01A-11R-1755-07 | TP-R | 7.4 | 48 | ||||
| Lung 3 | TCGA-55-6984-11A-01D-1945-08 | F | NA | NT-W | 36.2 | 101 | |
| TCGA-55-6984-11A-01R-1949-07 | NT-R | 4.9 | 48 | ||||
| TCGA-55-6984-01A-11D-1945-08 | TP-W | 41 | 101 | ||||
| TCGA-55-6984-01A-11R-1949-07 | TP-R | 5.2 | 48 |
NT-D, Normal tissue DNA-seq; NT-R, Normal tissue RNA-seq; TP-D, Tumor primary DNA-seq; TP-R, Tumor primary RNA-seq.
Numbers of MELT and Mobster predicted TE insertions in matched normal (N) and primary tumor (T) samples across 9 individuals.
| Breast 1 | 913 | 28 | 127 | 1069 | 853 | 33 | 110 | 997 |
| Breast 2 | 1004 | 21 | 121 | 1147 | 1160 | 54 | 143 | 1358 |
| Breast 3 | 1012 | 63 | 139 | 1215 | 952 | 60 | 136 | 149 |
| Head 1 | 984 | 72 | 140 | 1197 | 741 | 66 | 107 | 915 |
| Head 2 | 945 | 25 | 131 | 1102 | 832 | 26 | 138 | 997 |
| Head 3 | 860 | 36 | 108 | 1005 | 819 | 41 | 112 | 973 |
| Lung 1 | 716 | 29 | 92 | 838 | 780 | 36 | 113 | 930 |
| Lung 2 | 806 | 25 | 103 | 935 | 701 | 20 | 94 | 816 |
| Lung 3 | 856 | 21 | 110 | 988 | 746 | 14 | 100 | 861 |
Figure 2Gene expression levels for matched normal vs. primary tumor tissue pairs. Normal tissue (NT) and tumor primary (TP) expression levels were measured for genes, transposable elements (TEs) and LINE1 elements (L1s) via analysis of RNA-seq data as described in the Materials and Methods. Expression levels are shown as distributions of log10 transformed read counts, and normal versus tumor comparisons are shown for breast invasive carcinoma (green), head and neck squamous cell carcinoma (red), and lung adenocarcinoma (blue). For each tissue type, the significance levels of the differences in L1 expression between normal and cancer pairs are indicated with P-values from the Kolmogorov-Smirnov test.
Figure 3TE insertional activity in matched normal vs. primary tumor tissue pairs. The number of TE insertions were measured for normal and primary tumor tissue pairs for breast invasive carcinoma, head, and neck squamous cell carcinoma and lung adenocarcinoma via analysis of whole genome DNA-seq data as described in the Materials and Methods. (A) The total number of predicted TE insertions, pooled for all nine individuals over the three cancer types analyzed here, are shown for normal vs. tumor tissue. Venn diagrams show the numbers of unique versus shared TE insertions for the two tissue types. (B) Comparison of the observed versus expected numbers of unique L1 insertions for normal vs. tumor tissue. (C) Comparison of the population frequencies of observed TE insertions in matched normal vs. tumor tissue pairs are shown for all of the TEs analyzed here and for L1s alone. (D–F) The same comparisons of TE insertion population frequencies are shown individually for each cancer type analyzed here. TE insertion population frequencies are color coded as shown in the key. P-values show the significance of the differences for observed distributions based on the Fisher's exact test (B) and the Kolmogorov-Smirnov test (C–F).
Figure 4Private TE insertions implicated as potential cancer driver mutations. (A) A tumor-specific Alu insertion (red) is found in a single breast cancer patient. The insertion is located within an upstream enhancer for the CBL gene on chromosome 11 (gene model shown in blue), as indicated by enhancer-associated chromatin marks (inset yellow bars). Presence of the Alu insertion is associated with down-regulation of CBL (expression levels in green). (B) A tumor-specific L1 insertion (red) is located within the first exon of the BAALC gene on chromosome 8 (gene model shown in blue). Co-location of the L1 insertion with promoter-associated chromatin marks (purple bars) is shown in the inset. Presence of the L1 insertion is associated with down-regulation of BAALC (expression levels in red).