| Literature DB >> 25470617 |
Magali Naville1, Domitille Chalopin1, Jean-Nicolas Volff1.
Abstract
Coelacanths are lobe-finned fish represented by two extant species, Latimeria chalumnae in South Africa and Comoros and L. menadoensis in Indonesia. Due to their intermediate phylogenetic position between ray-finned fish and tetrapods in the vertebrate lineage, they are of great interest from an evolutionary point of view. In addition, extant specimens look similar to 300 million-year-old fossils; because of their apparent slowly evolving morphology, coelacanths have been often described as « living fossils ». As an underlying cause of such a morphological stasis, several authors have proposed a slow evolution of the coelacanth genome. Accordingly, sequencing of the L. chalumnae genome has revealed a globally low substitution rate for protein-coding regions compared to other vertebrates. However, genome and gene evolution can also be influenced by transposable elements, which form a major and dynamic part of vertebrate genomes through their ability to move, duplicate and recombine. In this work, we have searched for evidence of transposition activity in coelacanth genomes through the comparative analysis of orthologous genomic regions from both Latimeria species. Comparison of 5.7 Mb (0.2%) of the L. chalumnae genome with orthologous Bacterial Artificial Chromosome clones from L. menadoensis allowed the identification of 27 species-specific transposable element insertions, with a strong relative contribution of CR1 non-LTR retrotransposons. Species-specific homologous recombination between the long terminal repeats of a new coelacanth endogenous retrovirus was also detected. Our analysis suggests that transposon activity is responsible for at least 0.6% of genome divergence between both Latimeria species. Taken together, this study demonstrates that coelacanth genomes are not evolutionary inert: they contain recently active transposable elements, which have significantly contributed to post-speciation genome divergence in Latimeria.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25470617 PMCID: PMC4255032 DOI: 10.1371/journal.pone.0114382
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Transposable element insertions in ca. 5.7 Mb of orthologous genomic sequences from the coelacanth species Latimeria chalumnae and L. menadoensis.
| TE classification | TE family | Common insertions | Species-specific insertions | ||
|
|
| ||||
| Class I (retrotransposons) | LINE | CR1 | 286 | 6 | 9 |
| L1 | 11 | - | 2 | ||
| L2 | 4 | 1 | - | ||
| SINE | CoeG-SINEs | 205 | 1 | 1 | |
| Others | 646 | 1 | 0 | ||
| LTR | Gypsy | 24 | 1 | - | |
| ERV | 0 | - (solo LTR) | 1 (element framed by 2 LTRs) | ||
| Class II (DNA transposons) | MITE-like | 8 | 2 | - | |
| Composite insertions | CR1/SINEs | - | 1 | - | |
| CoeG-SINE/LF-SINE | - | - | 1 | ||
| Other Class I and Class II families | 1,879 | - | - | ||
| Total | 3,063 | 13 | 14 | ||
TE = Transposable Element; LINE = Long Interspersed Nuclear Element; SINE = Short Interspersed Nuclear Element; LTR = Long Terminal Repeat; CR1 = Chicken Repeat 1; L1 = LINE 1; L2 = LINE 2; ERV = Endogenous Retrovirus; MITE = Miniature Inverted-repeat Transposable Element.
*The ERV insertion observed in L. menadoensis does not strictly correspond to an insertion polymorphism, the solo LTR observed at the orthologous site in L. chalumnae probably being the result of a recombination between the two LTRs framing the element (see main text).
**A composite insertion is observed in L. menadoensis, constituted by a Coeg-SINE flanked by two LF-SINEs in direct orientation. Only a “solo” LF-SINE is observed in L. chalumnae, suggesting deletion through homologous recombination between both LF-SINEs.
These “insertions” mostly comprise insertions sensu stricto but also a few deletions that occurred at the orthologous site in the other species.
Structural features of species-specific transposable element insertions in ca. 5.7 Mb of orthologous genomic sequences from the coelacanth species Latimeria chalumnae and L. menadoensis.
| Type of TE | Species with insertion | Insertion identifier | Insertion length (nt) | Target Site Duplication? | ORF(s)/Domain(s)/Specific features | Copy number in genomic sequences * | Element representation in the transcriptome ** | Genomic position relative to next gene | Distance to the closest exon (kb) and corresponding gene |
| CR1 | L. ch. | 1 | 1622 | AT | ORF2: RT | 5 (2 with id ≥98%) | 17 | IGR | 5.9 ( |
| 2 | 1060 | AT-rich region | ORF2: RT (partial) | 1 | 0 | Intron (exon 4–exon 5) | 0.7 (exon 5) ( | ||
| 3 | 1097 |
| ORF2: RT (partial) | 1 | 4 | IGR | 4.0 ( | ||
| 4 | 227 | CTA | - | 49 (5 with id ≥98%) | 1 | IGR | 9.7 ( | ||
| 5 | 320 | TTTAG | - | 37 (5 with id ≥98%) | 0 | IGR | 9.4 (vomeronasal 2 receptor) | ||
| 6 | 303 | TATTAGG | - | 1 | 0 | IGR | >70.9 ( | ||
| L. me. | 7 | 2845 | ACTCA | ORF2: RT, APE (partial) | 23 (4 with id ≥98%) | 24 | IGR | >9.0 | |
| 8 | 2821 | AAT | ORF2: RT, APE (partial) | 24 | 31 | IGR | 3.2 ( | ||
| 9 | 1174 | AAGTA | ORF2: RT (partial) | 4 | 8 | IGR | 3.6 ( | ||
| 10 | 1038 | CCAT | ORF2: RT (partial) | 74 (18 with id ≥98%) | 10 | IGR | 18.3 (protocadherin gamma) | ||
| 11 | 862 | GATTAA | ORF2: RT (partial) | 86 (19 with id ≥98%) | 6 | Intron (exon 2–exon 3) | 0.2 (exon 3) ( | ||
| 12 | 1398 | TCTA | ORF2: RT (partial) | 57 (15 with id ≥98%) | 13 | IGR | 37.5 ( | ||
| 13 | 1019 | Poly-A region | ORF2: RT (partial) | 1 | 0 | Intron (exon 2–exon 3) | 1.3 (exon 3) ( | ||
| 14 | 385 | ND | - | 110 (14 with id ≥98%) | 0 | Intron (exon 4–exon 5) | 0.4 (exon 4) ( | ||
| 15 | 387 | CTATTCC | - | 109 (12 with id ≥98%) | 3 | Intron (exon 2–exon 3) | 6.2 (exon 2) (FAT tumor suppressor homolog) | ||
| L1 | L. me. | 16 | 2168 |
| Endonuclease (PFAM PF02994, “Transposase_22”) (partial) | 2 (2 with id ≥98%) | 20 | IGR | 41.6 ( |
| 17 | 1999 | ND | RT | 4 | 19 | IGR | 0.8 ( | ||
| L2 | L. ch. | 18 | 2219 | G | RT | 2 | 0 | IGR | 3.4 ( |
| CoeG-SINE | L. ch. | 19 | 1362 | ND | - | 1 | 16 | Intron (exon 4–exon 5) | 0.6 (exon 5) (von Willebrand factor A domain containing 5A) |
| L. me. | 20 | 1018 | ATTTT | - | 1 | 0 | IGR | 18.0 ( | |
| LF-SINE | L. ch. | 21 (inserted within element 22) | 391 | TG | - | 48 | 0 | IGR | 33.1 (uncharacterized protein) |
| Gypsy | L. ch. | 22 | 896 |
| RT, no LTR | 1 | 1 | IGR | 33.1 (uncharacterized protein) |
| ERV | L. me. | 23 | 5091 | AGAT | Gag, Pol, Env (partial), LTR | 1 | 41 | IGR | 10.6 ( |
| MITE-like | L. ch. | 24 | 225 | CCT | - | 2 | 0 | IGR | 6.4 (von Willebrand factor A domain containing 5A) |
| 25 | 1311 | ATTTCAAG | Derived from a hAT transposon | 1 | 5 | IGR | 2.8 ( | ||
| Composite insertion | L. ch. | 26 | 2303 | T | CR1 (RT, TSD “AGT”)/SINE (TSD “AAGT”)/LF-SINE/CoeSINE | 1 | 5 | IGR | 90.7 ( |
| L. me. | 27 | 1249 | ND | CoeG-SINE/LF-SINE | 1 | 0 | IGR | >12.8 |
ORF = Open Reading Frame; L. ch. = Latimeria chalumnae; L. me. = Latimeria menadoensis; IGR = Intergenic Region; RT = Reverse Transcriptase; APE = Apurinic/Apyrimidic Endonuclease; LTR = Long Terminal Repeat; ND = Not Detected; TSD = Target Site Duplication; *Number of BlastN hits in the analyzed regions, with hit length ≥80% of insertion length and identity ≥80%, criteria that are classically used to define TE families; ** Number of BlastN hits against L. menadoensis testis transcriptome with hit length ≥80 nt and identity ≥95%.
Figure 1Example of a polymorphic insertion of a CR1 retrotransposon (element 7 in Table 2) present in Latimeria menadoensis but absent from L. chalumnae.
Target Site Duplications (TSDs) are framed in red. CR1 = Chicken Repeat 1; ORF = Open Reading Frame; RT = Reverse Transcriptase; APE = Apurinic/Apyrimidic Endonuclease.
Figure 2Structure of coelacanth endogenous retrovirus CoeERV1-1.
(A) Solo-LTR observed in L. chalumnae. (B) Schematic representation of ERV insertion 23 found at the orthologous position in L. menadoensis. (C) Reconstructed structure of CoeERV1-1 in the L. chalumnae genome. TSD = Target Site Duplication; LTR = Long Terminal Repeats; Gag: ORF encoding protein for the viral capsid; Pol: ORF encoding proteins responsible for synthesis of the viral DNA and integration into host DNA, including protease (Pro), reverse transcriptase (RT), ribonuclease H (RH) and integrase (Int); Env: ORF encoding envelope protein.
Figure 3Phylogenetic relationship between coelacanth CoeERV1-1 and reptile retroviruses.
Vertebrate retrovirus phylogeny was reconstructed on an alignment of RT (210 amino acids) using Maximum Likelihood with optimized parameters (best of NNI and SPR; optimized invariable sites [37]. Branch values represent supporting aLRT non-parametric statistics. The dashed line highlights the group of Epsilon viruses containing turtle, crocodile, coelacanth and lungfish sequences. Gypsy LTR retrotransposon sequences were used as an outgroup.