| Literature DB >> 30153831 |
Robert J Gifford1, Jonas Blomberg2, John M Coffin3, Hung Fan4, Thierry Heidmann5, Jens Mayer6, Jonathan Stoye7, Michael Tristem8, Welkin E Johnson9.
Abstract
Retroviral integration into germline DNA can result in the formation of a vertically inherited proviral sequence called an endogenous retrovirus (ERV). Over the course of their evolution, vertebrate genomes have accumulated many thousands of ERV loci. These sequences provide useful retrospective information about ancient retroviruses, and have also played an important role in shaping the evolution of vertebrate genomes. There is an immediate need for a unified system of nomenclature for ERV loci, not only to assist genome annotation, but also to facilitate research on ERVs and their impact on genome biology and evolution. In this review, we examine how ERV nomenclatures have developed, and consider the possibilities for the implementation of a systematic approach for naming ERV loci. We propose that such a nomenclature should not only provide unique identifiers for individual loci, but also denote orthologous relationships between ERVs in different species. In addition, we propose that-where possible-mnemonic links to previous, well-established names for ERV loci and groups should be retained. We show how this approach can be applied and integrated into existing taxonomic and nomenclature schemes for retroviruses, ERVs and transposable elements.Entities:
Keywords: Classification; Endogenous; Nomenclature; Retrovirus; Taxonomy
Mesh:
Year: 2018 PMID: 30153831 PMCID: PMC6114882 DOI: 10.1186/s12977-018-0442-1
Source DB: PubMed Journal: Retrovirology ISSN: 1742-4690 Impact factor: 4.602
Fig. 1Retroviral genome invasion and the fate of endogenous retrovirus (ERV) loci in the germline. The three panels show schematic diagrams illustrating how the distribution of ERVs is influenced by a host phylogeny; b activity of ERV lineages within the gene pool; c patterns of ERV locus inheritance within populations of host species. Panel a shows how ERV lineages originate when infection of an ancestral species by an ancient retrovirus causes a ‘germline colonisation’ event in which a retroviral provirus is integrated into the nuclear genome of a germline cell that then goes on to develop into a viable organism. This ‘founder’ ERV provirus can subsequently generate further copies within the germline (panel b). The fate of individual ERV loci is determined by selective forces at the level of the host population. Most ERV loci are quickly eliminated from the germline via selection or drift. However, some may increase in frequency from one host generation to the next, to the point where they become genetically ‘fixed’—i.e. they occur in all members of the species. The schematic in panel c illustrates this in a simplified way, showing an ERV locus (copy x) becoming fixed in over several host generations. As shown in panel a, fixed ERV loci persist in the host germline as ‘footprints’ of ERV activity, and the identification of orthologous ERV loci in multiple species indicates that those species diverged after the ERV was inserted. Thus, when host divergence dates have been estimated, they can be used to infer minimum ages for orthologous ERV loci. Importantly, extinction of host lineages eliminates swathes of ERV loci. In some rare cases, however, their sequences may still be recoverable (e.g. see [79]). Abbreviations: ERV endogenous retrovirus, NWM New World monkeys, OWM Old World monkeys
Fig. 2Genomic structure of ERV sequences. Panel a shows a schematic representation of a generalised retroviral provirus. The four coding domains found in all exogenous retroviruses are indicated. The precise organization of these domains varies among retrovirus lineages, and some viruses also encode additional genes. The long terminal repeat (LTR) sequences are comprised of three distinct subregions that are named according to their organization in the genomic RNA: unique 3′ region (U3), repeat region (R), and unique 5′ region (U5). Panel b shows a schematic representation of processes that modify ERV sequences. (1) Recombination between the two LTRs of a single provirus resulting in the formation of a solo LTR. (2) Recombination between the 3′ and 5′ LTRs of a given provirus leading to a tandem duplicated provirus. (3) Adaptation to intracellular retrotransposition, resulting in the loss of the envelope gene. (4) LINE1-mediated retrotransposition, resulting in loss of the 5′ U3 sequence, and the 3′ U5 sequence. Variants with larger 5′ truncations may also occur. Poly-A tails at the 3′ end and L1-typical target site duplications flanking the retrotransposed sequence are usually found for these forms.
Figure partly adapted from [80]
Retroviral genera and their endogenous representatives
| Genus | Type species | Endogenous representativea | |
|---|---|---|---|
|
| ALV | ALV | [ |
|
| MMTV | MMTV | [ |
|
| MLV | MLV | [ |
|
| HTLV-1 | MinERVa | [ |
|
| WDSV |
| |
|
| SRLV-A | RELiK | [ |
|
| SFV | SloEFV | [ |
ALV avian leukosis virus, MMTV mouse mammary tumour virus, MLV murine leukemia virus, HTLV human T cell leukemia virus, WDSV walleye dermal sarcoma virus, SRLV-A small ruminant lentivirus A, SFV simian foamy virus, MinERVa Miniopterus endogenous deltaretrovirus, RELiK rabbit endogenous lentivirus K, SloEFV sloth endogenous foamy virus
aFirst reported endogenous representative shown, with citation
bNo ERVs have been identified that group robustly within the Epsilonretrovirus genus. However, distantly related, ‘epsilon-like’ elements have been described, such as the MER65/HERV-Lb elements found in the human genome [6, 76–78]
Fig. 3Proposed ERV ID structure. The proposed ID consists of three components separated by hyphens. The second component consists of two subcomponents, separated by a period, that identify (1) the group the ERV belongs to, and (2) the unique numeric ID of the locus. The third component identifies the species or species group in which the element(s) being referred to occur
Application of the proposed nomenclature to example ERV loci
| Example description | Locus ID |
|---|---|
| ERV-L insertion identified in all eutherian mammalsa | ERV-L.1- |
| Human copy of ERV-L.1- | ERV-L.1- |
| ERV-L.1-HomSap* | |
| ERV-L.1-Hsa* | |
| L.1-Hsa** | |
| HERV.K (HML2) 113 | ERV-K(HML2).113- |
| Chimpanzee ortholog of HERV.K (HML2) 113 | ERV-K(HML2).113-Ptr |
| All copies of HERV.K (HML2) 113 found in great apes ( | ERV-K(HML2).113- |
| Human copy HERV-K(HML2) 4q35.2 | ERV-K(HML2).4352- |
| Polytropic murine leukemia virus ERV 1 (Pmv-1) in mouse | ERV-Pmv.1-Mus musculus |
| Xenotropic murine leukemia virus ERV 8 (Xmv-8) in mouse | ERV-Xmv.8-Mmu |
| Mouse mammary tumour virus (MMTV) locus 9 (Mtv9) | ERV-MMTV.8-Mmu |
| Xmv-8 in inbred mouse strain C57L | ERV-Xmv.8-Mmu.C57L |
| Copy 2 of rabbit endogenous lentivirus K (RELiK) in rabbit | ERV-RELiK.2- |
| ERV-RELiK.2-OryCun* | |
| Copy 2 of rabbit endogenous lentivirus K (RELiK) in hare | ERV-RELiK.2- |
| ERV-RELiK.2- | |
| RELiK.2- | |
| Macaque copy #183 of an unclassified Betaretrovirus-like virus | ERV-AB.183- |
| Peregrine falcon copy #25 of avian ‘Betaretrovirus-like lineage 3′ | ERV-AB3.25- |
| Use of trailing element to indicate alternative alleles of a polymorphic insertion | ERV-K(HML2).113-Hsa.ad |
| ERV-K(HML2).113-Hsa.bd | |
| Use of trailing element to indicate alternative genome structures of a polymorphic insertion | ERV-K(HML2).113-Hsa.provirusd |
| ERV-K(HML2).113-Hsa.LTRd |
*Alternative versions using an abbreviation to designate the host species component of the ID
**A shorter form of the ID can be used when it is clear from the context—or from the lineage component of the ID—that an ERV is being referred to
aFor reference, see [35]
bWe propose that where established numeric IDs are already in use, they should be preserved, as is the case for many representatives of the well researched HERV-K(HML2) lineage
cIn this example, an ID is assigned to an ERV locus that has only previously been referred to via its cytogenetic location—a numeric ID is therefore proposed that preserves a mnemonic link to this cytogenetically-based identifier, without preserving the information about cytogenetic location. This follows a principle of our proposal wherein the numeric ID component of the overall ERV ID can retain mnemonic links to previous IDs, but all auxiliary information associated with ERV loci is obtained from a database via a unique ID, rather than encoded into the ID itself
dHowever, where it aids discussion such information can be appended to the ERV ID stem (e.g. to distinguish distinct alleles and genome structures)
Fig. 4Schematic phylogeny illustrating the basis for a unified ERV and retrovirus taxonomy. The top two brackets indicate taxonomic groupings. The ‘clade’ level reflects three major divergences in orthoretroviral reverse transcriptase genes [71]. The seven officially recognised genera are shown as coloured goblets at phylogeny tips. In addition, three placeholder groups are shown: Spumavirus-related (S), Gammaretrovirus/Epsilonretrovirus-related (GE), and Alpharetrovirus/Betaretrovirus-related (AB). Placeholder groups (indicated by coloured squares) are reserved for ERVs that do not group within the diversity of established genera. Within these broad groups, additional subgroupings representing well-established monophyletic ERV lineages may be recognized. Here, some examples are indicated, shown emerging from each of their parent groups. Ultimately, some of these lineages might be attributed genus status, and would be moved to the appropriate level within this classification scheme