| Literature DB >> 26613339 |
Magali Lescot1, Pascal Hingamp1, Kenji K Kojima2, Emilie Villar1, Sarah Romac3,4, Alaguraj Veluchamy5, Martine Boccara5, Olivier Jaillon6,7,8, Daniele Iudicone9, Chris Bowler5, Patrick Wincker6,7,8, Jean-Michel Claverie1, Hiroyuki Ogata10.
Abstract
Genes encoding reverse transcriptases (RTs) are found in most eukaryotes, often as a component of retrotransposons, as well as in retroviruses and in prokaryotic retroelements. We investigated the abundance, classification and transcriptional status of RTs based on Tara Oceans marine metagenomes and metatranscriptomes encompassing a wide organism size range. Our analyses revealed that RTs predominate large-size fraction metagenomes (>5 μm), where they reached a maximum of 13.5% of the total gene abundance. Metagenomic RTs were widely distributed across the phylogeny of known RTs, but many belonged to previously uncharacterized clades. Metatranscriptomic RTs showed distinct abundance patterns across samples compared with metagenomic RTs. The relative abundances of viral and bacterial RTs among identified RT sequences were higher in metatranscriptomes than in metagenomes and these sequences were detected in all metatranscriptome size fractions. Overall, these observations suggest an active proliferation of various RT-assisted elements, which could be involved in genome evolution or adaptive processes of plankton assemblage.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26613339 PMCID: PMC5029228 DOI: 10.1038/ismej.2015.192
Source DB: PubMed Journal: ISME J ISSN: 1751-7362 Impact factor: 10.302
Figure 1ML-trees of known and environmental RT sequences. (a) ML-tree of known RT sequences with 124 RT-like sequences identified in the metagenomic data. Black dots indicate RT-like sequences from the metagenomic data. EC1 stands for Environmental Clade 1, LEC for LINE Environmental Clades (LEC1-LEC3) and GEC for Gypsy Environmental Clades (GEC1, GEC2). LTR retrotransposons: Gypsy, Copia, BEL and retrovirus. Non-LTR retrotransposons: LINE (APE-type and REL-type). Distributions know reference RTs in prokaryotes (P) and eukaryotes (E) are as follows: LINE (E), Gypsy (E), Caulimovirus (E), Retrovirus (E), PLE (E), DIRS (E), RTL (E), RVT (E/P), TERT (E), group II intron (E/P), DGR (P), retron (P), retroplasmid (E), Hepadnavirus (E). ANB7SUR0CCII11BDE.ASY2CTG1927.ANO1.snap_1 is marked by a purple arrow, and AHX7DCM1GGMM11BCE.ASY1CTG361.ANO1.mga_1 by a green arrow. The sequence marked by an orange arrow is AHX23DCM1GGMM11BCE.ASY1CTG52.ANO1.mga_1. (b) ML-tree of known RT sequences with 100 Copia RT-like sequences identified in the metagenomic data. Black dots indicate RT-like sequences from the metagenomic data. CEC stands for Copia Environmental Clades (CEC1-CEC4). Copia belong to LTR retrotransposons. Taxonomic distributions of known reference Copia are as follows: CoDi (diatoms), GalEA (animals and red algae), PyRE1G1 (red algae), Hydra (Cnidarian), Tork (land plant), Mtanga and Tricopia (insecta), copia (insecta), pCreta (fungi), 1731 (diptera), Osser (plant), Retrofit (plant), Sire (plant), Oryco (plant), Ty (fungi). (c) ML-tree of known RT sequences with 100 longest BEL RT-like sequences identified in the metagenomic data. Black dots indicate RT-like sequences from the metagenomic data. BEC stands for BEL Environmental Clades (BEC1-BEC5). BEL belong to LTR retrotransposons. Tas (nematoda and cnidaria), Bel (insecta), Pao (insecta and vertebrata), Sinbad (invertebrata), Suzu (echinodermata and vertebrata). New environmental clades satisfying the three criteria defined in Materials and Methods are marked by a red-shaded area.
Highly represented CDD profiles in the ORF set from the 180–2000 μm size fraction metagenomes
|
|
|
|
|---|---|---|
| cd01650/RT_nLTR_like | Non-LTR retrotransposon and non-LTR retrovirus RT. This subfamily contains both non-LTR retrotransposons and non-LTR retrovirus RTs. |
|
| cd01647/RT_LTR | RTs from retrotransposons and retroviruses, which have LTRs in their DNA copies but not in their RNA template. |
|
| pfam00665/rve | Integrase core domain. Integrase mediates integration of a DNA copy of the viral genome into the host chromosome. Integrase is composed of three domains. |
|
| pfam03372/Exo_endo_phos | Endonuclease/exonuclease/phosphatase family. This large family of proteins includes magnesium-dependent endonucleases and a large number of phosphatases involved in intracellular signaling. |
|
| cd01644/RT_pepA17 | RRTs in retrotransposons. This subfamily represents the RT domain of a multifunctional enzyme. |
|
| cd00204/ANK | Ankyrin repeats; ankyrin repeats mediate protein–protein interactions in very diverse families of proteins. | 438 |
| pfam05380/Peptidase_A17 | Pao retrotransposon peptidase. Corresponds to Merops family A17. |
|
| KOG2462/KOG2462 | KOG2462, C2H2-type Zn-finger protein (Transcription). | 342 |
| cd00190/Tryp_SPc | Trypsin-like serine protease; many of these are synthesized as inactive precursor zymogens that are cleaved during limited proteolysis to generate their active forms. | 321 |
| pfam07727/RVT_2 | RT (RNA-dependent DNA polymerase). A RT gene is usually indicative of a mobile element such as a retrotransposon or retrovirus. |
|
| pfam05970/DUF889 | PIF1 helicase. The PIF1 helicase inhibits telomerase activity and is cell cycle regulated. | 249 |
| pfam00078/RVT_1 | RT (RNA-dependent DNA polymerase). |
|
| KOG3623/KOG3623 | KOG3623, Homeobox transcription factor SIP1 (Transcription). | 211 |
| cd06222/RNase H | RNase H (RNase HI) is an endonuclease that cleaves the RNA strand of an RNA/DNA hybrid in a not sequence-specific manner. |
|
| KOG1214/KOG1214 | KOG1214, nidogen and related basement membrane protein. | 179 |
Abbreviations: CDD, Conserved Domain Database; LTR, long terminal repeat; ORF, open reading frame; RT, reverse transcriptase.
CDD profiles representing proteins related to retrotransposons and retroviruses. For these profiles, the number of assigned ORFs are shown in bold letters.
A recent report demonstrated that pfam05380 is actually an RNase H domain similar to cd06222 (RNase H) (Majorek ).
Figure 2Relative RT gene abundance and classification of the RTs identified in the metagenomic data. In the station names, S and D denote, respectively, SUR and DCM depths.
Figure 3Relative RT gene abundance and classification of transcriptionally active RT-like sequences identified in the metatranscriptomic data.
Figure 4Detrended Correspondence Analysis (DCA) of metatranscriptomic RT gene abundance. DCA ordinations of 21 samples are shown for metatranscriptomic RT gene abundance, with significant (P⩽0.05) environmental vectors fitted using envfit (Oksanen ). Arrows indicate the direction of the (increasing) environmental gradient, and their lengths are proportional to their correlations with the ordination. X stands for samples from size fraction 180–2000 μm, L for 20–180 μm, M for 5–20 μm and S for 0.8–5 μm. Samples from St TARA_007 is colored in pink, St TARA_023 in blue, and St TARA_030 in green.