Literature DB >> 27924012

L1Base 2: more retrotransposition-active LINE-1s, more mammalian genomes.

Tobias Penzkofer1, Marten Jäger2, Marek Figlerowicz3, Richard Badge4, Stefan Mundlos2, Peter N Robinson2,5, Tomasz Zemojtel6,3.   

Abstract

LINE-1 (L1) insertions comprise as much as 17% of the human genome sequence, and similar proportions have been recorded for other mammalian species. Given the established role of L1 retrotransposons in shaping mammalian genomes, it becomes an important task to track and annotate the sources of this activity: full length elements, able to encode the cis and trans acting components of the retrotransposition machinery. The L1Base database (http://l1base.charite.de) contains annotated full-length sequences of LINE-1 transposons including putatively active L1s. For the new version of L1Base, a LINE-1 annotation tool, L1Xplorer, has been used to mine potentially active L1 retrotransposons from the reference genome sequences of 17 mammals. The current release of the human genome, GRCh38, contains 146 putatively active L1 elements or full length intact L1 elements (FLIs). The newest versions of the mouse, GRCm38 and the rat, Rnor_6.0, genomes contain 2811 and 492 FLIs, respectively. Most likely reflecting the current level of completeness of the genome project, the latest reference sequence of the common chimpanzee genome, PT 2.19, only contains 19 FLIs. Of note, the current assemblies of the dog, CF 3.1 and the sheep, OA 3.1, genomes contain 264 and 598 FLIs, respectively. Further developments in the new version of L1Base include an updated website with implementation of modern web server technologies. including a more responsive design for an improved user experience, as well as the addition of data sharing capabilities for L1Xplorer annotation.
© The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Year:  2016        PMID: 27924012      PMCID: PMC5210629          DOI: 10.1093/nar/gkw925

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Long interspersed elements class 1 (LINE-1s, L1s) are the active autonomous non-LTR retrotransposons in mammalian genomes. L1s are present in a large number of copies, resulting in them making up about 17% of the human genome (1). Due to the ability of L1s to ‘copy’ and ‘paste’ themselves into multiple genomic locations they have had a significant impact on genome and organismal evolution (2). In addition to insertional mutagenesis, because of features such as an anti-sense promoter located in the human and mouse L1s, which can drive the transcription of adjacent genes (3,4) L1 elements can interfere with genomic content and functionality in numerous ways; transcriptional disruption (5,6) alternative splicing and exonization (4,7,8), as well as the creation of processed pseudogenes (9), and mobilization of the high copy number Alu SINE sequence family (10,11) that themselves contain gene regulatory elements (12–14). While direct disruption of exons by L1 insertion is a well established cause of human genetic disease (15), recently it has been shown that intronic insertions of nearly full-length L1 elements can result in recessive genetic diseases, when inherited from both parental alleles (16). This observation emphasizes the need for current annotation of these structural variants, which are often assumed to be benign. As LINE-1s have been active in mammalian genomes since before the mammalian radiation (17–19), the human genome contains 16 distinct L1 families (L1PA16–L1PA1), which have gradually evolved from the mammalian radiation to the present day (17,18,20). Interestingly, only members of the young Ta, ACA/G (L1PA1) and pre-Ta, ACG/G subfamilies, defined by the shared sequence variants (SSVs) AC[A/G] and G in their 3′ UTRs, display retrotransposition activity in vivo (21–23). It has been suggested that the typical human genome contains between 80–100 retrotransposition-active LINE-1s (24), which is in line with the number of ∼145 putatively active L1s residing in the reference genome (25). More recently, it has been also suggested that highly active or ‘hot’ LINE-1 elements are more frequent in the human population than previously assumed, but are under ascertained due to presence/absence polymorphism (21). Ongoing and future studies will likely continue to report novel human-specific polymorphic LINE-1s and database resources, like euL1db (26), will aid greatly in collecting these L1 insertions. Such resources will reveal the frequency and distribution of L1 insertions in increasing detail, but due to limitations of the underlying technology most often used (short read NGS) lack detailed annotation. To gain more insight into a repertoire of active LINE-1 elements in mammalian genomes we employed L1Xplorer (25), to catalog and most importantly, characterize with respect to functionality and phylogeny, two types of potentially active L1 insertions: (i) full length L1s (FLI-L1s) with intact ORF1 and ORF2 (ii) L1s with intact ORF2 but disrupted ORF1 (ORF2-L1s) that may still be able to drive Alu elements (10). We also annotated full length non-intact insertions, which constitute evolutionary fossils in mammalian genomes. The current version of L1Base (25), in addition to updated annotations for the human, mouse and rat genomes, also annotates seven other primate genomes, including the common chimpanzee genome and another seven mammalian genomes (e.g. cow, horse, dog).

DATABASE

Data acquisition

The most recent available (Ensembl Version 84) releases of reference genome assemblies for species already annotated in L1Base: Homo sapiens, Mus musculus, Rattus norvegicus, as well as newly added species: Pan troglodytes, Macaca mulatta, Pongo abelii, Gorilla gorilla and Bos Taurus (PT 2.19, MMUL 1.0, PPYG2, gorGor3.1, UMD3.1) were downloaded from the Ensembl internet resource (27) along with the MySQL annotation databases containing the current genomic and RepeatMasker (http://www.repeatmasker.org, (28)) annotations for the respective genome versions. A standalone version of L1Xplorer was executed to generate the L1Base annotations for these genomes. As described previously (25) L1Xplorer is a set of Perl applications performing (i) a BLAST search using a known L1 template for a given species (i.e. Homo sapiens L1.2, GenBank: AH005269.2) or predefined coordinates from genomic RepeatMasker annotations (ii) extraction of the corresponding genomic regions and detection of ORF1 and ORF2 via TFASTX (29) and (iii) annotation of a number of features specific to L1 lineages, family classifications and predicted or experimentally characterized features which have been implicated in L1 biology. The L1Xplorer outputs, along with the previous contents of L1Base were stored in a web-accessible interface.

Database description

L1Base contains three different LINE-1 categories: (i) full-length L1s with intact ORF1 and ORF2 (FLI-L1s), (ii) L1s with intact ORF2 but disrupted ORF1 (ORF2-L1s) and full length non-intact L1s (FLnI-L1s). The sequences of full length non-intact L1s (FLnI-L1) were extracted from the Ensembl RepatMasker annotations with the following thresholds: (i) Mus musculus: 5000 bp, (ii) Rattus norvegicus: 5500 bp and (iii) for Homo sapiens, Pan troglodytes, Macaca mulatta, Pongo Abelii, Gorilla gorilla, Callithrix jacchus, Papio anubis, Felis catus, Canis familiaris, Bos taurus, Ovis aries, Sus scrofa and Orycolagus cuniculus the threshold of 4500 bp was used. For Equus caballus and Chlorocebus sabaeus, the 4500 bp threshold for Ensembl RepeatMasker annotations resulted in an insignificant number of FLnIs (Equcab: 0, Chlsab: 101) and thus we used the L1Xplorer tool to extract FLnI-L1s from these two genomes. Currently, genomes of 17 species are annotated: eight primates, two rodents and seven other mammals (Table 1). It can be observed that in accordance with the maturity of the reference genome sequence, the number of annotated FLI-L1s, ORF1-L1s and FLnIs increases, often dramatically (Table 1). Interestingly, the current reference genomes of the dog and the sheep contain as many as 264 and 598 FLIs, respectively. The latter is suggestive of the high rate of L1 activity in these two species.
Table 1.

Number of annotated LINE-1 Elements in the 2007 and the current 2016 releases of reference genomes

SpeciesFLI-L1sORF2-L1sFLnI-L1s
Year200720162007201620072016
Primates
Homo sapiens*14514610310711 65313 418
Pongo abelii (Orangutan)-20-22-7601
Pan troglodytes (Chimp)-19-24-12 218
Macaca mulatta (Macaque)-20-21-6605
Gorilla gorilla (Gorilla)-0-1-7074
Callithrix jacchus (Marmorset)-6-4-8251
Chlorocebus sabaeus (Vervet)-6-17841
Papio anubis (Baboon)-12-32-22 389
Rodents
Mus musculus** (Mouse)2382281146656313 69214 076
Rattus norvegicus*** (Rat)377492183292523610 073#
Carnivora
Canis familaris (Dog)-264-57-9653
Felis catus (Cat)-1-0-8075
Bovidae
Ovis aries (Sheep)-598-181-3551
Bos taurus (Cow)-16-20-2989
Others
Equus caballus (Horse)-72-37-6766
Sus scrofa (Pig)-0-0-4495
Oryctolagus cuniculus (Rabbit)-0-0-4296

*NCBI36 versus GRCh38, **NCBIm35 versus GRCm38, ***RGSC 3.4 versus Rnor_6.0, # due to updated Repbase consensus sequences for rat, the length threshold for Rnor_6.0 FLnI was lowered from 6000 nt to 5500 nt.

*NCBI36 versus GRCh38, **NCBIm35 versus GRCm38, ***RGSC 3.4 versus Rnor_6.0, # due to updated Repbase consensus sequences for rat, the length threshold for Rnor_6.0 FLnI was lowered from 6000 nt to 5500 nt.

User interface

The entry point into the L1Base website is the pull down menu for species selection. Upon species selection, the user is brought to the query form, where queries involving multiple criteria such as chromosomal localization, integrity of ORFs or conserved motifs can be executed.

Data export/sharing

In order to support the user tracks feature, as implemented by different genome browser sites, L1Base now supports a .BED file export for every database provided, thus replacing the outdated DAS protocol. A link to the respective .BED data source is provided by the main L1Base database list. These .BED annotations can be easily included in custom UCSC genome browser views, to enable integration with other annotation data (https://genome.ucsc.edu/goldenpath/help/customTrack.html) (30). L1Xplorer, the annotation tool provided with L1Base, has been improved to support data sharing between users by implementing a share link feature available at the top of a webpage with specific L1 annotation.

DATABASE ACCESS

L1base has been moved from http://l1base.molgen.mpg.de to http://l1base.charite.de.

Case study 1: Identification of the full length intact L1s (FLI-L1s) in four primate genomes

The current release of the human genome, GRCh38, contains 146 FLI-L1s (Table 1). The L1Xplorer-based family classifications (see Figure 1) revealed that 76 (∼52%) of them belong to the Homo sapiens-specific young Ta (diagnostic nucleotides ACA/G) family (31). Interestingly, a substantial number, 25 (∼17%), of the FLI-L1s found belong to the older L1PA2 (diagnostic nucleotides GAG/A) element class, which amplified during the period of primate radiation (31).
Figure 1.

Annotation of FLI-L1s elements present in four primate genomes. HS: Homo sapiens, PT: Pan troglodytes. MM: Macaca mulatta, PA: Pongo abelii, Repeatmasker annotations: L1PA2, L1Pt, L1PA2,3,5. The first column lists annotations obtained by employing L1Xplorer.

Annotation of FLI-L1s elements present in four primate genomes. HS: Homo sapiens, PT: Pan troglodytes. MM: Macaca mulatta, PA: Pongo abelii, Repeatmasker annotations: L1PA2, L1Pt, L1PA2,3,5. The first column lists annotations obtained by employing L1Xplorer. Likely reflecting the current stage of genome sequencing and assembly, we discovered only ∼20 FLIs in the 3 primate genomes: chimpanzee, orangutan and rhesus macaque. L1Xplorer annotations, as they are stored in L1Base, compared to RepeatMasker annotations, allow further phylogenetic subcategorization (Figure 1). For example, the human FLI-L1s annotated by RepeatMasker simply as L1HS, can be subdivided by L1Xplorer into more than seven subcategories based on the annotated features (Figure 1).

Case study 2: Annotation of active mouse L1spa element

The insertion of LINE-1 element (GenBank: AF016099.1), L1spa, into intron 6 of Glrb has been associated with the spastic mouse phenotype (32–34). In order to annotate it, we executed L1Xplorer with the advanced option: extend locus by 2000 nt. Since L1Xplorer detects the 5′ UTR monomers (including A-Monomer I, A-Monomer II, A-Monomer III, A-Monomer IV, A-Monomer V, A-Monomer VI, T.F-Monomer, F-Monomer, G.F-Monomer) (32), this analysis revealed that the L1spa element belongs to the T.F family (Figure 2). In detail, L1Xplorer analysis showed that the sequence of the first detected T.F monomer is missing the last 24 nt, the next six T.F monomers are of full length and the last one is missing the first 79 nt, as compared to the template of the T.F monomer. The L1spa element has one copy of a 66 bp repeat, and two copies of a 42 bp repeat, in the length polymorphism region of ORF1 (35,36). The most similar FLI mouse sequence to the L1spa element resides on chr.12: 91946103-91935703 (GRCm38, L1Base ID: 2590), as identified by executing the Blast search function of L1Base. The L1Base ID: 2590 entry shows 99.95% identity, and this element differs by only 4 nt (7498/7502 nt), from L1spa. We might reasonably conclude that this element is the most likely progenitor of the L1spa insertion, highlighting the utility of detailed annotation in exposing LINE-1 biology.
Figure 2.

L1Xplorer-based annotation of the L1spa element.

L1Xplorer-based annotation of the L1spa element.

ADDITIONAL FEATURES

In order to implement more modern web technologies and to improve accessibility, the L1Base website code was updated to support responsive design elements. In brief, the static HTML4 technology was replaced by HTML5/Javascript using state-of-the-art libraries (Bootstrap, JQuery). This ensures seamless adaptation to different device classes (tablet, personal computer) and screen resolutions, as well as improving the overall user experience.

PERSPECTIVES

With the development of sequencing technologies enabling long-read generation (37), and the concomitant increasing inclusion of long dispersed repeats, like L1 elements, in assemblies of individual genomes, polymorphic L1 transposon insertions will become much better represented in publically available databases. L1Base will aim to catalog the ongoing expansion of LINE-1 variation, with a focus on the putatively active L1 repertoire in personal genomes, offering the potential for data and annotation resources that more realistically represent the true extent and diversity of active L1elements segregating in populations.
  37 in total

1.  Antisense promoter of human L1 retrotransposon drives transcription of adjacent cellular genes.

Authors:  M Speek
Journal:  Mol Cell Biol       Date:  2001-03       Impact factor: 4.272

2.  Methylation and deamination of CpGs generate p53-binding sites on a genomic scale.

Authors:  Tomasz Zemojtel; Szymon M Kielbasa; Peter F Arndt; Ho-Ryun Chung; Martin Vingron
Journal:  Trends Genet       Date:  2008-12-26       Impact factor: 11.639

Review 3.  Human L1 retrotransposition: insights and peculiarities learned from a cultured cell retrotransposition assay.

Authors:  J V Moran
Journal:  Genetica       Date:  1999       Impact factor: 1.082

4.  Conservation throughout mammalia and extensive protein-encoding capacity of the highly repeated DNA long interspersed sequence one.

Authors:  F H Burton; D D Loeb; C F Voliva; S L Martin; M H Edgell; C A Hutchison
Journal:  J Mol Biol       Date:  1986-01-20       Impact factor: 5.469

5.  Retrotransposition and mutation events yield Rap1 GTPases with differential signalling capacity.

Authors:  Tomasz Zemojtel; Marlena Duchniewicz; Zhongchun Zhang; Taisa Paluch; Hannes Luz; Tobias Penzkofer; Jürgen S Scheele; Fried J T Zwartkruis
Journal:  BMC Evol Biol       Date:  2010-02-19       Impact factor: 3.260

6.  LINE-mediated retrotransposition of marked Alu sequences.

Authors:  Marie Dewannieux; Cécile Esnault; Thierry Heidmann
Journal:  Nat Genet       Date:  2003-08-03       Impact factor: 38.330

Review 7.  Mobile elements and mammalian genome evolution.

Authors:  Prescott L Deininger; John V Moran; Mark A Batzer; Haig H Kazazian
Journal:  Curr Opin Genet Dev       Date:  2003-12       Impact factor: 5.578

8.  Unit-length line-1 transcripts in human teratocarcinoma cells.

Authors:  J Skowronski; T G Fanning; M F Singer
Journal:  Mol Cell Biol       Date:  1988-04       Impact factor: 4.272

9.  CpG deamination creates transcription factor-binding sites with high efficiency.

Authors:  Tomasz Zemojtel; Szymon M Kielbasa; Peter F Arndt; Sarah Behrens; Guillaume Bourque; Martin Vingron
Journal:  Genome Biol Evol       Date:  2011-10-19       Impact factor: 3.416

10.  The Ensembl gene annotation system.

Authors:  Bronwen L Aken; Sarah Ayling; Daniel Barrell; Laura Clarke; Valery Curwen; Susan Fairley; Julio Fernandez Banet; Konstantinos Billis; Carlos García Girón; Thibaut Hourlier; Kevin Howe; Andreas Kähäri; Felix Kokocinski; Fergal J Martin; Daniel N Murphy; Rishi Nag; Magali Ruffier; Michael Schuster; Y Amy Tang; Jan-Hinnerk Vogel; Simon White; Amonida Zadissa; Paul Flicek; Stephen M J Searle
Journal:  Database (Oxford)       Date:  2016-06-23       Impact factor: 3.451

View more
  49 in total

1.  Endogenous LINE-1 (Long Interspersed Nuclear Element-1) Reverse Transcriptase Activity in Platelets Controls Translational Events Through RNA-DNA Hybrids.

Authors:  Hansjörg Schwertz; Jesse W Rowley; Gerald G Schumann; Ulrike Thorack; Robert A Campbell; Bhanu Kanth Manne; Guy A Zimmerman; Andrew S Weyrich; Matthew T Rondina
Journal:  Arterioscler Thromb Vasc Biol       Date:  2018-01-04       Impact factor: 8.311

2.  cGAS/STING Pathway Activation Contributes to Delayed Neurodegeneration in Neonatal Hypoxia-Ischemia Rat Model: Possible Involvement of LINE-1.

Authors:  Marcin Gamdzyk; Desislava Met Doycheva; Camila Araujo; Umut Ocak; Yujie Luo; Jiping Tang; John H Zhang
Journal:  Mol Neurobiol       Date:  2020-04-06       Impact factor: 5.590

Review 3.  Measuring and interpreting transposable element expression.

Authors:  Sophie Lanciano; Gael Cristofari
Journal:  Nat Rev Genet       Date:  2020-06-23       Impact factor: 53.242

4.  Human L1 Transposition Dynamics Unraveled with Functional Data Analysis.

Authors:  Di Chen; Marzia A Cremona; Zongtai Qi; Robi D Mitra; Francesca Chiaromonte; Kateryna D Makova
Journal:  Mol Biol Evol       Date:  2020-12-16       Impact factor: 16.240

5.  Enabling large-scale genome editing at repetitive elements by reducing DNA nicking.

Authors:  Cory J Smith; Oscar Castanon; Khaled Said; Verena Volf; Parastoo Khoshakhlagh; Amanda Hornick; Raphael Ferreira; Chun-Ting Wu; Marc Güell; Shilpa Garg; Alex H M Ng; Hannu Myllykallio; George M Church
Journal:  Nucleic Acids Res       Date:  2020-05-21       Impact factor: 16.971

6.  Characterization of full-length LINE-1 insertions in 154 genomes.

Authors:  Jessica S Wong; Tanaya Jadhav; Eleanor Young; Yilin Wang; Ming Xiao
Journal:  Genomics       Date:  2021-09-15       Impact factor: 5.736

Review 7.  Melatonin: Regulation of Viral Phase Separation and Epitranscriptomics in Post-Acute Sequelae of COVID-19.

Authors:  Doris Loh; Russel J Reiter
Journal:  Int J Mol Sci       Date:  2022-07-23       Impact factor: 6.208

8.  Recent, full-length gene retrocopies are common in canids.

Authors:  Kevin Batcher; Scarlett Varney; Daniel York; Matthew Blacksmith; Jeffrey M Kidd; Robert Rebhun; Peter Dickinson; Danika Bannasch
Journal:  Genome Res       Date:  2022-08-12       Impact factor: 9.438

9.  LINE- and Alu-containing genomic instability hotspot at 16q24.1 associated with recurrent and nonrecurrent CNV deletions causative for ACDMPV.

Authors:  Przemyslaw Szafranski; Ewelina Kośmider; Qian Liu; Justyna A Karolak; Lauren Currie; Sandhya Parkash; Stephen G Kahler; Elizabeth Roeder; Rebecca O Littlejohn; Thomas S DeNapoli; Felix R Shardonofsky; Cody Henderson; George Powers; Virginie Poisson; Denis Bérubé; Luc Oligny; Jacques L Michaud; Sandra Janssens; Kris De Coen; Jo Van Dorpe; Annelies Dheedene; Matthew T Harting; Matthew D Weaver; Amir M Khan; Nina Tatevian; Jennifer Wambach; Kathleen A Gibbs; Edwina Popek; Anna Gambin; Paweł Stankiewicz
Journal:  Hum Mutat       Date:  2018-08-22       Impact factor: 4.878

10.  Perturbed DNA methylation by Gadd45b induces chromatin disorganization, DNA strand breaks and dopaminergic neuron death.

Authors:  Camille Ravel-Godreuil; Olivia Massiani-Beaudoin; Philippe Mailly; Alain Prochiantz; Rajiv L Joshi; Julia Fuchs
Journal:  iScience       Date:  2021-06-19
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.