| Literature DB >> 23984937 |
Wensheng Zhang1, Andrea Edwards, Wei Fan, Zhide Fang, Prescott Deininger, Kun Zhang.
Abstract
BACKGROUND: The exonization of transposable elements (TEs) has proven to be a significant mechanism for the creation of novel exons. Existing knowledge of the retention patterns of TE exons in mRNAs were mainly established by the analysis of Expressed Sequence Tag (EST) data and microarray data.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23984937 PMCID: PMC3765721 DOI: 10.1186/1471-2164-14-584
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Summary of the analyzed samples and datasets
| BT20 | ER- breast cancer cell line | P/50 | 137897876 | 118362274 | GEO: GSE27003; Sun et al., 2011 [ |
| MDAMB231 | ER- breast cancer cell line | P/50 | 96483262 | 80759008 | Same as above |
| MDAMB468 | ER- breast cancer cell line | P/50 | 123622490 | 104468212 | Same as above |
| MCF7 | ER + breast cancer cell line | P/50 | 129592066 | 107422464 | Same as above |
| BT474 | ER + breast cancer cell line | P/50 | 131597078 | 110877774 | Same as above |
| T47D | ER + breast cancer cell line | P/50 | 119708904 | 99784684 | Same as above |
| ZR751 | ER + breast cancer cell line | P/50 | 107891488 | 90362316 | Same as above |
| MCF10A | Normal breast cell line | P/50 | 125148556 | 108184998 | Same as above |
| LNCaP | Prostate cancer cell line | S/35 | 38953595 | 27126677 | GEO: GSE29155; Kim et al., 2011 [ |
| PrEC | Normal prostate cell line | P/35 | 27825207 | 22366081 | Same as above |
| LCL-1a | Lymphocyte cell lines | P/36,38 | 64168681 | 54301769 | GEO: GSE25030; Montgomery et al., 2010 [ |
| LCL-2 | Lymphocyte cell lines | P/36,38 | 71571009 | 60017609 | Same as above |
| OV-1-prb | Ovarian cancer cell line | P/42 | 75756477 | 69739708 | SRA: ERP000710 [ |
| OV-1-re | Ovarian cancer cell line | P/42 | 95813480 | 89853848 | Same as above |
| OV-2-fi | Ovarian cancer cell line | P/42 | 90793509 | 84057374 | Same as above |
| OV-2-se | Ovarian cancer cell line | P/42 | 100491160 | 93088808 | Same as above |
| OV-3-pr | Ovarian cancer cell line | P/42 | 72611523 | 66647818 | Same as above |
| OV-3-re | Ovarian cancer cell line | P/42 | 89393849 | 84067652 | Same as above |
| prAd_1c | Prostate adenocarcinoma | S/33 | 32495059 | 23386861 | GEO: GSE24283; Nacu et al., 2011 [ |
| prAd_2 | Prostate adenocarcinoma | S/33 | 34663805 | 24398162 | Same as above |
| prAd_3 | Prostate adenocarcinoma | S/33 | 66976637 | 48613865 | Same as above |
| prNorm_1 | Normal prostate tissue | S/50,75 | 37201269 | 30394802 | Same as above |
| prNorm_2 | Normal prostate tissue | S/50 | 37451643 | 30392995 | Same as above |
| prNorm_3 | Normal prostate tissue | S/33 | 33511439 | 25294452 | Same as above |
| Brain | Brain tissue | S/50 | 48741218 | 42660318 | Same as above |
| Liverd | Liver tissue | S/35 | 31258238 | 26253587 | GEO: GSE17274; Blekhman et al., 2010 [ |
$The letters and numbers in the third column represent sequencing types, single read (S) or paired-end reads (P), and read lengths, respectively. We excluded the non-primary hits when counting the mapped reads (fourth column). The numbers of unambiguously mapped reads are listed in the column of filtered reads.
aLCL-1 and −2 are Coriell human lymphocyte lines NA12892 and NA19238.
b-1, -2 and −3 indicate the IDs of the cell lines. –pr, -re, -fi, and –se indicate the clinical history. They represent “present”, “relapse”, “first relapse” and “second relapse”, respectively.
c-1, -2 and −3 are the IDs of prostate adenocarcinoma (or normal prostate tissue) samples.
dThe data sets of 12 liver samples (replicates) were combined before we mapped reads to the human genome and the computationally identified exon-exon junctions.
Statistical analysis on the digital expressions of the exons in different classes
| BT20 | 0.03 | 0.98 | 1.72E-36 | 0.07 | 0.91 | 3.11E-29 | 0.26 | 0.38 | 4.28E-20 | 0.51 | 0.07 | 7 7.43E-20 |
| MDAMB231 | 0.05 | 0.98 | 1.89E-49 | 0.09 | 0.9 | 4.67E-28 | 0.33 | 0.42 | 2.20E-19 | 0.6 | 0.07 | 7.65E-16 |
| MDAMB468 | 0.04 | 0.99 | 4.64E-69 | 0.08 | 0.9 | 9.92E-33 | 0.28 | 0.36 | 8.99E-23 | 0.58 | 0.07 | 3.54E-12 |
| MCF7 | 0.03 | 0.99 | 8.91E-64 | 0.07 | 0.89 | 1.75E-34 | 0.3 | 0.31 | 2.54E-12 | 0.48 | 0.07 | 1.45E-20 |
| BT474 | 0.03 | 0.99 | 3.81E-62 | 0.08 | 0.9 | 4.31E-34 | 0.36 | 0.35 | 6.87E-13 | 0.53 | 0.07 | 1.22E-12 |
| T47D | 0.04 | 0.98 | 4.33E-26 | 0.08 | 0.92 | 5.72E-34 | 0.2 | 0.35 | 3.00E-31 | 0.58 | 0.06 | 7.08E-16 |
| ZR751 | 0.04 | 0.99 | 7.93E-58 | 0.08 | 0.9 | 1.06E-31 | 0.29 | 0.3 | 3.26E-19 | 0.6 | 0.07 | 6.88E-10 |
| MCF10A | 0.04 | 0.99 | 2.00E-72 | 0.09 | 0.89 | 2.26E-31 | 0.28 | 0.36 | 8.59E-26 | 0.6 | 0.06 | 4.63E-14 |
| LNCaP | 0.09 | 0.99 | 4.16E-91 | 0.16 | 0.88 | 6.58E-23 | 0.39 | 0.38 | 2.99E-21 | 0.7 | 0.08 | 4.12E-05 |
| PrEC | 0.08 | 0.99 | 1.18E-121 | 0.15 | 0.86 | 7.20E-28 | 0.49 | 0.33 | 2.55E-11 | 0.7 | 0.08 | 6.84E-05 |
| LCLs | 0.02 | 0.99 | 2.92E-136 | 0.06 | 0.86 | 9.98E-35 | 0.18 | 0.3 | 3.05E-24 | 0.45 | 0.06 | 1.00E-15 |
| OV-1-pr | 0.02 | 0.98 | 9.29E-50 | 0.05 | 0.91 | 4.59E-43 | 0.23 | 0.29 | 3.65E-15 | 0.39 | 0.08 | 1.17E-07 |
| OV-1-re | 0.03 | 1 | 2.53E-188 | 0.06 | 0.84 | 1.32E-36 | 0.17 | 0.3 | 6.77E-19 | 0.39 | 0.08 | 3.36E-06 |
| OV-2-fi | 0.02 | 0.99 | 1.29E-153 | 0.05 | 0.86 | 3.86E-35 | 0.17 | 0.34 | 6.15E-16 | 0.29 | 0.11 | 8.54E-07 |
| OV-2-se | 0.02 | 0.99 | 8.43E-77 | 0.04 | 0.89 | 6.42E-46 | 0.2 | 0.3 | 2.13E-13 | 0.31 | 0.09 | 4.55E-07 |
| OV-3-pr | 0.03 | 0.99 | 2.01E-162 | 0.07 | 0.84 | 7.23E-42 | 0.35 | 0.23 | 2.60E-08 | 0.47 | 0.1 | 1.19E-01 |
| OV-3-re | 0.02 | 0.99 | 2.39E-109 | 0.05 | 0.87 | 1.84E-40 | 0.17 | 0.27 | 1.51E-16 | 0.4 | 0.08 | 4.65E-07 |
| prAd_1 | 0.14 | 1 | 1.24E-256 | 0.25 | 0.71 | 1.82E-14 | 0.57 | 0.28 | 2.43E-12 | 0.81 | 0.05 | 2.42E-01 |
| prAd_2 | 0.12 | 1.01 | 7.82E-292 | 0.22 | 0.72 | 6.88E-17 | 0.6 | 0.3 | 7.12E-10 | 0.79 | 0.05 | 2.50E-01 |
| prAd_3 | 0.07 | 1.02 | 0.00E + 00 | 0.16 | 0.71 | 3.15E-19 | 0.46 | 0.27 | 7.33E-13 | 0.7 | 0.05 | 1.83E-02 |
| prNorm_1 | 0.07 | 0.99 | 7.83E-170 | 0.13 | 0.83 | 2.49E-34 | 0.42 | 0.3 | 1.35E-17 | 0.68 | 0.06 | 2.23E-06 |
| prNorm_2 | 0.07 | 0.99 | 1.41E-172 | 0.12 | 0.83 | 2.33E-26 | 0.4 | 0.33 | 1.39E-14 | 0.65 | 0.07 | 2.01E-05 |
| prNorm_3 | 0.13 | 1.01 | 5.47E-288 | 0.23 | 0.74 | 7.63E-16 | 0.55 | 0.35 | 3.49E-12 | 0.78 | 0.06 | 7.63E-02 |
| Brain | 0.04 | 0.99 | 6.99E-183 | 0.09 | 0.84 | 5.03E-26 | 0.31 | 0.31 | 1.70E-13 | 0.54 | 0.08 | 1.30E-09 |
| Liver | 0.08 | 1 | 8.84E-124 | 0.15 | 0.85 | 5.08E-28 | 0.52 | 0.26 | 1.19E-06 | 0.68 | 0.09 | 1.67E-01 |
R: the proportion of exons with no reads mapped to the genomic regions in the alignment among the entire set of exons in the corresponding categories. M: the average of the rescaled RPKMs. p: the p-values for stepwise comparisons (see Results section for details).
Figure 1Histograms for the digital expression levels of TE exons in sample BT20 (representing cluster G1). The black bar on the left side of each plot represents the proportion of un-expressed exons.
Figure 2Histograms for the variability and standardized inter-class differences of the digital expression levels of TE exons. CV: the coefficient of variance of the rescaled RPKMs across the 26 samples. BR t-statistic: calculated with the difference and pooled standard deviation of the rescaled RPKMs for three ER- breast cancer cell lines and four ER + breast cancer cell lines. PR t-statistic: calculated with the difference and pooled standard deviation of the rescaled RPKMs for three prostate adenocarcinoma samples and three normal prostate tissue samples.
Figure 3Distribution of TE exons by the cognate TE families and the inclusion (presence or absence) in the UCSC RefGene table. A number in the upper row is the proportion of the un-annotated TE exons within the corresponding family among the entire set. A number in the lower row is the proportion of the annotated TE exons within a family.
Figure 4Visualization of the effects of genomic factors on the digital expression of TE exons. The results indicated by the top rows of color boxes are inferred by the median of TE exons’ un-expressed ratio and rescaled RPKMs across all samples. In the analysis, Alu and CDS are set as the baselines for the two categorical factors, location and TE family, respectively.
Figure 5The inverse relationship between the expression levels and expression variability of TE exons. A: The effects of genomic factors on the middle digital expression level of TE exons (the median of the rescaled RPKMs across the 26 samples). B: The effects of genomic factors on the coefficient of variance for the rescaled RPKM of TE exons across the 26 samples. C: The scatter plot of the median(s) and the coefficient(s) of variance. The TE exons hosted by the genes un-expressed in over half of samples are excluded before the statistical analysis (using Model-2). The height of an error bar in Plots A and B represents the two-time standard error of the corresponding effect coefficient. In Plot C, the correlation was calculated by the Kendall method.
Functional enrichment analysis of the host genes with 327 highly expressed TE exons
| MF1 | GO:0008270 ~ zinc ion binding | 59 | 1.16E-05 | 1.63E-02 |
| MF | GO:0046914 ~ transition metal ion binding | 66 | 3.01E-05 | 4.21E-02 |
| MF | GO:0043169 ~ cation binding | 85 | 3.18E-04 | 4.45E-01 |
| MF | GO:0046872 ~ metal ion binding | 84 | 3.93E-04 | 5.49E-01 |
| MF | GO:0043167 ~ ion binding | 85 | 5.37E-04 | 7.51E-01 |
| BP 2 | GO:0006350 ~ transcription | 50 | 8.46E-04 | 1.35E + 00 |
| MF | GO:0003677 ~ DNA binding | 52 | 1.44E-03 | 2.00E + 00 |
| BP | GO:0009101 ~ glycoprotein biosynthetic process | 9 | 2.64E-03 | 4.17E + 00 |
1Molecular Function; 2Biological process.
Classification and comparison of the C2H2 ZNF genes hosting highly-expressed TE exons
| Alu | 850 | 51 | 143 | 17 | 7.65E-04 |
| CR1 | 11 | 0 | 4 | 0 | NA |
| ERV1 | 29 | 0 | 13 | 0 | NA |
| ERVL | 19 | 1 | 6 | 1 | 0.00E + 00 |
| L1 | 148 | 12 | 24 | 5 | 4.71E-03 |
| L2 | 79 | 7 | 34 | 5 | 2.27E-02 |
| MaLR | 57 | 1 | 18 | 1 | 0.00E + 00 |
| MER1 | 40 | 2 | 13 | 1 | 1.00E-01 |
| MER2 | 37 | 2 | 15 | 2 | 0.00E + 00 |
| MIR | 132 | 4 | 56 | 3 | 3.04E-02 |
| Other_DNA | 4 | 0 | 1 | 0 | NA |
TE: the categories of exonized TEs. N1: the number of the genes hosting TE exons. N2: the number of C2H2 ZNF genes hosting TE exons. N3: the number of genes hosting the highly-expressed TE exons. N4: the number of C2H2 ZNF genes hosting the highly-expressed TE exons. Among the 327 highly-expressed TE exons, 115 are included in the UCSC RefGene table.