| Literature DB >> 30053206 |
Guia Guffanti1,2, Andrew Bartlett3, Torsten Klengel1,2,4, Claudia Klengel1,2, Richard Hunter3, Gennadi Glinsky5, Fabio Macciardi6.
Abstract
Expression of transposable elements (TE) is transiently activated during human preimplantation embryogenesis in a developmental stage- and cell type-specific manner and TE-mediated epigenetic regulation is intrinsically wired in developmental genetic networks in human embryos and embryonic stem cells. However, there are no systematic studies devoted to a comprehensive analysis of the TE transcriptome in human adult organs and tissues, including human neural tissues. To investigate TE expression in the human Dorsolateral Prefrontal Cortex (DLPFC), we developed and validated a straightforward analytical approach to chart quantitative genome-wide expression profiles of all annotated TE loci based on unambiguous mapping of discrete TE-encoded transcripts using a de novo assembly strategy. To initially evaluate the potential regulatory impact of DLPFC-expressed TE, we adopted a comparative evolutionary genomics approach across humans, primates, and rodents to document conservation patterns, lineage-specificity, and colocalizations with transcription factor binding sites mapped within primate- and human-specific TE. We identified 654,665 transcripts expressed from 477,507 distinct loci of different TE classes and families, the majority of which appear to have originated from primate-specific sequences. We discovered 4,687 human-specific and transcriptionally active TEs in DLPFC, of which the prominent majority (80.2%) appears spliced. Our analyses revealed significant associations of DLPFC-expressed TE with primate- and human-specific transcription factor binding sites, suggesting potential cross-talks of concordant regulatory functions. We identified 1,689 TEs differentially expressed in the DLPFC of Schizophrenia patients, a majority of which is located within introns of 1,137 protein-coding genes. Our findings imply that identified DLPFC-expressed TEs may affect human brain structures and functions following different evolutionary trajectories. On one side, hundreds of thousands of TEs maintained a remarkably high conservation for ∼8 My of primates' evolution, suggesting that they are likely conveying evolutionary-constrained primate-specific regulatory functions. In parallel, thousands of transcriptionally active human-specific TE loci emerged more recently, suggesting that they could be relevant for human-specific behavioral or cognitive functions.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30053206 PMCID: PMC6188555 DOI: 10.1093/molbev/msy143
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
. 1.A graphical representation of the GGDNA workflow used to identify each single expressed TE transcript from RNA-seq data of DLPFC in our sample. The reads generated by the RNA-seq procedures are first aligned to the annotated reference TE database from Repeatmasker, then reads at each single locus are assembled de novo. Transcripts with <95% sequence identity with the reference and/or align at <90% of their length are discarded during the step of quality control: reads that are discarded are identified with the symbol and reads that are carried on with the symbol . See also text for details of the procedures.
Distinct Classes of Primate- and Human-Specific TE Loci Transcriptionally Active in the Dorsolateral Prefrontal Cortex (DLPFC) of Human Brain.
| Classification Category | TE Transcripts Expressed in DLPFC | Primate-Specific TEs, | Human-Specific TEs, | Average Number of Reads per Transcript |
|---|---|---|---|---|
| LTR class | 126,849 | 101,733 (80.2%) | 596 (0.47%) | 303 |
| LINE class | 319,509 | 245,383 (76.8%) | 2,108 (0.66%) | 468 |
| SINE class | 155,366 | 132,216 (85.1%) | 715 (0.46%) | 117 |
| DNA class | 43,608 | 31,965 (73.3%) | 87 (0.19%) | 286 |
| Other (SVA) | 3,317 | 3,313 (99.9%) | 770 (23.2%) | 400 |
| Total | 654,665 | 519,804 (79.4%) | 4,276 (0.66%) | 346 |
Note.—The majority of the transcripts (94.1%) are supported by >20 reads (88.7% by 20–1,000 reads and 5.7% by >1,000 reads), and only 5.9% by <20 reads. TE loci that have <10% of bases remapped during the conversion from the human genome (hg38) to the mouse genome (mm10) were defined as primate-specific loci; TE loci that have <10% of bases remapped during the conversion to both Chimpanzee (PanTro5) and Bonobo genomes were defined as human-specific loci; TE, transposable elements.
. 2.Validation of the actively transcribed HERV locus HERVKC4 on chromosome 19 in human DLPFC. A (top figure). The cartoon reports the sequence coordinates (not in scale) of the tested HERVKC4 (in red), the transcript assembled by GGDNA (in green) and the region captured by qPCR (in blue). B (middle figure). Visualization of the read alignment to the GGDNA transcript for each of the four RNA samples; highlighted in blue the region corresponding to the qPCR product. C (bottom figure). Result on agarose gel of the qPCR product for one RNA sample with the relative length size of 100 bp. Quantitative RT-PCR (qRT-PCR) validation experiments of four HERVKC4 transcripts were carried-out on four of the nineteen analyzed RNAs extracted from the human DLPFC samples (three controls and one Schizophrenia sample). The sequence identity of the purified PCR products has been confirmed by direct sequencing.
Primate- and Human-Specific TE Transcripts Originated from Loci Harboring Binding Sites of the Master Pluripotency Regulators NANOG, POU5F1, and CTCF.
| Classification Category | Primate-Specific Loci, | Nonhuman Primates’ Loci, | Human-Specific Loci, | |||
|---|---|---|---|---|---|---|
| NANOG-binding sites | ||||||
| Genome (hg38) | 29,083 | 28,267 | 816 | |||
| Expected number of expressed loci | 5,172 | 5,171 | 71 | |||
| Observed number of expressed loci in postmortem DLPFC samples | 6,399 (22%) | 3.37E-37 | 6,197 (21.9%) | 5.24E-27 | 202 (24.8%) | 1.79E-18 |
| CTCF-binding sites | ||||||
| Genome (hg38) | 28,236 | 27,661 | 575 | |||
| Expected number of expressed loci | 5,021 | 5,060 | 50 | |||
| Observed number of expressed loci in postmortem DLPFC samples | 4,144 (14.7%) | 1.47E-23 | 4,113 (14.9%) | 2.70E-27 | 31 (5.4%) | 0.037 |
| OCT4/POU5F1-binding sites | ||||||
| Genome (hg38) | 12,458 | 10,130 | 2,328 | |||
| Expected number of expressed loci | 2,216 | 1,853 | 203 | |||
| Observed number of expressed loci in postmortem DLPFC samples | 1,866 (15%) | 2.28E-09 | 1,774 (17.5%) | 0.15 | 92 (4%) | 2.21E-11 |
| NANOG + POU5F1 + CTCF binding sites | ||||||
| Genome (hg38) | 69,777 | 66,058 | 3719 | |||
| Observed number of expressed loci in postmortem DLPFC samples | 12,409 (17.8%) | 12,084 (18.3%) | 325 (8.7%) |
P values reflecting the statistical significance between the observed and expected numbers of expressed loci was estimated using a two-tailed Fisher’s exact test; the Expected numbers of expressed loci were calculated based on the percentage of all expressed TE-derived loci in the corresponding classification category; Nonhuman primates’ loci refer to conserved in primates loci common to humans and nonhuman primates.
Two Distinct Evolutionary Patterns of Highly Conserved in Primates and Human-Specific TE Loci Transcriptionally Active in Human’s DLPFC.
| TE Family | DLPFC Expressed RNAs ( | DLPFC Expressed Loci ( | Highly Conserved in Primates Loci, | Human-Specific Loci, | Humans/Primates Ratio | Highly Conserved and Human-Specific Loci ( | Highly Conserved and Human-Specific Loci (%) |
|---|---|---|---|---|---|---|---|
| L1Hs | 1,240 | 463 | 51 (11%) | 354 (76.5%) | 6.9 | 405 | 87.5 |
| L1PA2 | 4,244 | 1,474 | 154 (10.4%) | 688 (46.7%) | 4.5 | 842 | 57.1 |
| SVA | 3,317 | 1,560 | 54 (3.5%) | 841 (53.9%) | 15.6 | 895 | 57.4 |
| LTR5 | 854 | 476 | 302 (63.4%) | 66 (13.9%) | −4.6 | 368 | 77.3 |
| HERVK | 1,447 | 563 | 434 (77.1%) | 49 (8.7%) | −8.9 | 483 | 85.8 |
| HERV9 | 483 | 172 | 140 (81.4%) | 10 (5.8%) | −14 | 150 | 87.2 |
| HERV (various) | 4,293 | 1,925 | 533 (89.4%) | 13 (2.2%) | −41 | 546 | 91.6 |
| LTR7 | 832 | 634 | 507 (80%) | 14 (2.2%) | −36.2 | 521 | 82.2 |
| HERVH | 2,365 | 1,101 | 855 (77.7%) | 30 (2.7%) | −28.5 | 886 | 80.4 |
| AluY | 14,288 | 12,184 | 8605 (70.6%) | 399 (3.3%) | −21.6 | 9,004 | 73.9 |
Note.—TE loci that have at least 95% of bases remapped during the direct and reciprocal conversions to the genomes of humans (hg38), Chimpanzee (PanTro5), and Bonobo were defined as highly conserved in primate sequences; TE loci that have <10% of bases remapped during the conversion from the human genome (hg38) to both Chimpanzee (PanTro5) and Bonobo genomes were defined as human-specific loci. Values in italic font report the cumulative numbers for corresponding classification categories.
. 3.Evolutionary dynamics of highly conserved-in-primates and human-specific TE loci transcriptionally active in DLPFC of human brain. DLPFC-expressed TEs having > 99% of individual loci represented by primate-specific sequences (table 2 and supplementary tables 1–4, Supplementary Material online) were identified and analyzed for expression of primate- and human-specific TEs. TE loci expressing the higher numbers of human-specific versus highly conserved-in-primate transcripts and vice versa were identified and analyzed in detail. Note that all TE loci that express the largest numbers of molecularly distinct human-specific TEs in human DLPFC display both common and distinct features of the evolutionary histories as represented by both highly conserved-in-primates and human-specific sequences. (A) The number of distinct TE loci expressing the largest numbers of human-specific TEs in human DLPFC are shown. All identified TE loci represented by markedly distinct numbers are human-specific and highly conserved-in-primate sequences. (B) TE loci that express the largest numbers of molecularly distinct human-specific TEs in human DLPFC display distinct evolutionary dynamics and show markedly distinct ratios of human-specific to highly conserved-in-primate sequences.
Examples of Genes Tagged by TE Transcripts in Human DLPFC with Already Established Neurodevelopmental Functions and Documented Genetic/Genomic/Epigenetic Alterations of Potential Functional Significance in the Human Lineage After the Divergence of Humans and Chimpanzees.
| Gene names and Classification Categories | Functionally Relevant Features on the Human Lineage | TE Transcripts, | Primate-Specific TE Transcripts, | Highly Conserved in Primates TE Transcripts, | Human-Specific TE Transcripts, | Human-Specific TE Loci |
|---|---|---|---|---|---|---|
| FOXP2 | Amino-acid substitutionsRegulatory sequence | 151 | 115 (76.2 %) | 144 (95.4%) | 2 | L1PA2 |
| CNTNAP2 | DNA methylation | 1,323 | 1,035 (78.2 %) | 1,224 (92.5%) | 22 | L1PA2; AluY; SVA; L1Hs |
| SRGAP2 | Duplications | 460 | 277 (60.2%) | 420 (91.3%) | 0 | NA |
| ARHGAP11B | Duplications | 28 | 26 (92.9%) | 26 (92.9%) | 0 | NA |
| NPAS3 | Highest density of human accelerated regions | 347 | 172 (49.6%) | 333 (96%) | 1 | L1PA2 |
| MEF2A | Excess of SNPs in an upstream gene-regulatory region | 124 | 101 (81.5%) | 123 (99.2%) | 0 | NA |
| AUTS2 | Regions of selective sweep in Modern Humans after the divergence with Neanderthals | 460 | 367 (79.8 %) | 427 (92.8%) | 5 | L1Hs |
| DYRK1A | Regions of selective sweep in Modern Humans after the divergence with Neanderthals | 77 | 61 (79.2 %) | 75 (97.4%) | 0 | NA |
| NRG3 | Regions of selective sweep in Modern Humans after the divergence with Neanderthals | 770 | 542 (70.4 %) | 721 (93.6%) | 1 | AluSc |
| FOXP1 | Functionally relevant protein–protein binding with the FOXP2 | 128 | 78 (60.9%) | 112 (87.5%) | 0 | NA |
| MEF2C | Duplications, partial deletions, microdeletions and mutations linked with haploinsufficiency | 286 | 193 (67.5%) | 282 (98.6%) | 0 | NA |
Note.—The Identification of primate-specific, highly conserved in primates, and human-specific TE sequences was performed as described in Materials and Methods.
NA, not applicable; detailed descriptions of specific genes and the list of primary references can be found in (Glinsky 2016, 2018; Sousa et al. 2017); TE transcripts, numbers of transcripts we have detected in our DLPFC samples.
. 4.TE transcriptome in the DLPFC of Schizophrenia patients. The heatmap presents the pattern of the 112 up- and down-regulated TEs with Fold Change ± 4 comparing schizophrenic patients with controls. The vertical tree shows the distribution of schizophrenic patients (blue line) and controls (red line). The horizontal tree shows the distribution of the 112 expressed TEs and the colors in the body of the graph show which TEs are over- (yellow) or underexpressed (purple). The list of the up and down-regulated TEs used to build the graph are reported in supplementary table 5, Supplementary Material online.
Enrichment Analysis of Genes Tagged by TE Transcripts Differentially Expressed in DLPFC of Schizophrenia Patients and Mapped Within the Boundaries of 50 Rapidly Evolving in Humans Topologically Associating Domains (revTADs).
| Classification Category | All Genes | Protein-Coding | Long ncRNAs, Including rRNAs | Small ncRNAs | Pseudogenes | Miscellaneous RNAs |
|---|---|---|---|---|---|---|
| Human genome | 57,173 | 20,412 | 14,727 | 5,221 | 14,600 | 2,213 |
| Mapped by TE transcripts in DLPFC within revTADs | 1,408 | 731 | 555 | 5 | 104 | 14 |
| Schizophrenia GES associated with TE transcripts in DLPFC | 1,137 | 908 | 190 | 0 | 38 | 1 |
| Schizophrenia GES located within revTADs | 67 | 48 | 12 | 0 | 7 | 0 |
| Percent of Schizophrenia GES located within revTADs | 6 | 5 | 6 | 0 | 18 | 0 |
| 5.16E-11 | 0.0018 | 0.028 | 1 | 7.78E-09 | 0.99 |
TE, transposable elements; DLPFC, dorsolateral prefrontal cortex; PFC, prefrontal cortex; revTADs, rapidly evolving in humans topologically associating domains; GES, gene expression signature; ncRNAs, noncoding RNAs; rRNAs, ribosomal RNAs.
P values were estimated using the hypergeometric distribution test.