| Literature DB >> 28245871 |
LeeAnn Ramsay1, Maria C Marchetto2, Maxime Caron1,3, Shu-Huang Chen3, Stephan Busche3, Tony Kwan1,3, Tomi Pastinen1,3, Fred H Gage2, Guillaume Bourque4,5.
Abstract
BACKGROUND: A significant portion of expressed non-coding RNAs in human cells is derived from transposable elements (TEs). Moreover, it has been shown that various long non-coding RNAs (lncRNAs), which come from the human endogenous retrovirus subfamily H (HERVH), are not only expressed but required for pluripotency in human embryonic stem cells (hESCs).Entities:
Keywords: Induced pluripotent stem cells; Long non-coding RNAs; Transposable elements
Mesh:
Substances:
Year: 2017 PMID: 28245871 PMCID: PMC5331655 DOI: 10.1186/s12864-017-3568-y
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1a Percent of all human TEs which are conserved in NHP grouped by TE class. C_G_R indicates TEs conserved in all 3 NHP species. b The proportion of instances in each TE family that are conserved in Chimpanzee (y-axis) relative to the average sequence identity score of the family (x-axis). Sequence similarity to the family’s consensus sequence is shown as a scaled Smith-Waterman (SW) score. This is used as a surrogate for the age of the TE family. Older TEs are on the left, and newer insertions are to the right. TE families which inserted more recently are less conserved than older TEs. TE families were filtered for those with more than 30 expressed instances in human. tRNA-Asn-ACC in red, AluYa5 in blue. c The same in Gorilla (d) The same in Rhesus
Fig. 2a Percent of all TEs expressed in human which are conserved in NHPs, grouped by class. C_G_R indicates TEs conserved in all 3 NHP species. b Percent of human expressed TEs which are conserved in terms of sequence and expressed in NHPs. c TE families plotted by average sequence identity and proportion conserved in NHPs. Y axis specifies the proportion of human expressed TEs which are also expressed in NHPs. Average sequence divergence is used as a surrogate for age, with older TEs on the left. This plot shows only families with 30 or more expressed TEs and 10 or more conserved in the target species. d Top 20 most conserved families (sequence and expression) when their conservation is summed across all 3 NHP species
TE expression conservation between human and chimpanzee iPSCs
| Family | Class | Total | Human | Chimpanzee | Proportion |
|
|---|---|---|---|---|---|---|
| expressed | expressed | |||||
| MER53 | DNA | 5308 | 39 | 20 | 0.51 | 9.61∗10−27 |
| MIR1_Amn | SINE | 9495 | 113 | 57 | 0.50 | 1.71∗10−71 |
| AluSg7 | SINE | 5780 | 50 | 25 | 0.50 | 6.81∗10−35 |
| LTR8 | LTR | 2516 | 37 | 18 | 0.49 | 1.81∗10−25 |
| L2d | LINE | 19063 | 135 | 64 | 0.47 | 2.32∗10−84 |
| Tigger4a | DNA | 3242 | 45 | 21 | 0.47 | 2.41∗10−20 |
| MADE1 | DNA | 7634 | 86 | 40 | 0.47 | 7.59∗10−51 |
| HERVH-int | LTR | 1266 | 50 | 23 | 0.46 | 9.26∗10−15 |
| MER94 | DNA | 4884 | 33 | 15 | 0.45 | 2.17∗10−21 |
| L1MC1 | LINE | 7375 | 33 | 15 | 0.45 | 8.19∗10−22 |
| L1MC4 | LINE | 12920 | 109 | 49 | 0.45 | 4.17∗10−64 |
| L1MB2 | LINE | 4967 | 38 | 17 | 0.45 | 1.49∗10−20 |
| Charlie4z | DNA | 5255 | 47 | 21 | 0.45 | 1.38∗10−27 |
| L1MB8 | LINE | 9006 | 56 | 25 | 0.45 | 7.32∗10−34 |
| L1M2 | LINE | 6281 | 56 | 25 | 0.45 | 4.05∗10−32 |
| OldhAT1 | DNA | 1897 | 30 | 13 | 0.43 | 5.22∗10−16 |
| AluYk3 | SINE | 5421 | 42 | 18 | 0.43 | 5.12∗10−24 |
| MER81 | DNA | 3551 | 31 | 13 | 0.42 | 4.35∗10−18 |
| L1MC3 | LINE | 6596 | 31 | 13 | 0.42 | 1.36∗10−17 |
| MER1B | DNA | 5060 | 36 | 15 | 0.42 | 2.05∗10−20 |
Here we examine only large families (≥100) from the main repeat classes (DNA, SINE, LINE, LTR). The table is sorted by the proportion of human expressed TEs which are conserved in chimpanzee. Only repeat families with at least 30 expressed instances in human are shown
Fig. 3a The intersection of 3 lncRNA annotations: iPSC with guide annotations, iPSC without guide, and GENCODE lncRNAs expressed in iPSCs. Over 90% of FEELnc human lncRNAs overlap with GENCODE lncRNAs. b The proportions in each primate lncRNA annotation compared to the genomic TE proportion. Only lncRNAs that overlap TEs are included in these proportions. c The proportion of human lncRNA sequence made up by TE families normalized by proportion of the genomic sequence made up by each family. The top 5 families from each of the 4 main classes are shown. d TEs that occur most frequently in human iPSC lncRNAs normalized by the size of each TE family. Only families with more than 10 members are shown. Red represents lncRNAs which are conserved in all 4 primate species. Green are those conserved in 1 or 2 other NHPs. Blue are human specific lncRNAs
Conservation and TE contribution to human iPSC lncRNAs
| Lift human to _ | LiftOver | LiftOver, TEs | Expressed in target species | LiftOver, Expressed, TEs |
|---|---|---|---|---|
| Chimpanzee | 7479 | 5175 (69.19%) | 2981 (39.86%) | 2103 (28.12%) |
| Gorilla | 6709 | 4707 (70.16%) | 2086 (31.09%) | 1465 (21.84%) |
| Rhesus | 6208 | 4550 (73.29%) | 1527 (24.60%) | 1351 (21.77%) |
Column 2: The number of human lncRNAs which lift to each NHP (out of 9332 human lncRNAs). Column 3: The number of lifted lncRNAs which overlap TEs in the target species. Column 4: Lifted human lncRNAs that are expressed in the target species. Column 5: Lifted and expressed lncRNAs that overlap TEs