| Literature DB >> 17963496 |
Tomasz Zemojtel1, Tobias Penzkofer, Jörg Schultz, Thomas Dandekar, Richard Badge, Martin Vingron.
Abstract
BACKGROUND: Long interspersed nuclear elements (LINE-1s, L1s) have been recently implicated in the regulation of mammalian transcriptomes.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17963496 PMCID: PMC2176070 DOI: 10.1186/1471-2164-8-392
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Classification of 2382 potentially active L1 elements residing in the mouse genome sequence (NCBI m35). A. Distribution of L1s among subfamilies. A, TF and GF correspond to active mouse L1 subfamilies. The small number of L1s that appear related to the inactivated F subfamily are marked with F and those lacking monomers are marked with N/A. B. Distributions of the lengths of the internal promoter regions among the three active families. The longest promoter discovered is composed of 28 monomers (here included in the 14+ class) and is a feature of a potentially active L1 element belonging to the A subfamily located on chromosome 2 (80663469-80649072).
Figure 2L1 exonization scenarios (I-V) involving sequences belonging to active L1 subfamilies A, TF, GF and related inactivated F subfamily, as identified in this study. The scenarios I-V are supported by 16, 26, 14, 6, and 2 exonization events, respectively (see Fig. 3 for details of cDNA sequences). SA: splice acceptor, SD: splice donor. In blue: L1-derived exons; in purple and gray: exons of transcriptional units; in light purple and light gray: exons which are not included in transcript due to L1 insertion.
Figure 3Multiple splice sites are present in antisense and sense L1 sequences (for annotated cDNA examples see [22], for exemplary cDNAs see below). L1Mda2 sequence M13002 was used as a coordinate reference. A. Diverse exonization patterns as supported by cDNA evidence. The names of the splice sites incorporate the following information: prefixes of "A_" and "F_" designate sites within A- and F -type (F, TF, GF) monomers, respectively; SD: splice donor, SA: splice acceptor; the numbering indicates the position of the base after which the cleavage occurs, relative to the start of L1 ORF1, or relative to the start of alignments for monomers (for the alignments see [2, 40]); prefix of "BG_" designates sites found in L1 inserted within an intron of the beige gene, prefix of "S" stands for sense splice sites. The blue boxes mark the monomers making up the internal promoter region. Exemplary cDNAs corresponding to the identified splice sites: F_SA+100: AK017011, BC025138; F_SA+142: AI194597, AK079058; F_SA+213: AK081008; F_SA+218: AK015559; A_SA+191: AK028243; SA-154: BC056642, AF487898, BQ442932, AK039191, AK043154, BG144807, AK044020, AK145348, BB614554, BY733866, AK076999, AK015267, AK035725; SA+106: AK080034, AY167972, BG144807, BY733866, AK015267, AK007310; SA+120: AK006905, SA+1930: NM_177142; SA+4117: AK034994; SA+5260: AF529222; SA+5614: AK032656; F_SD+24: AK035725; F_SD+213: AK081008; F_SD+218: AK035725; A_SD+72: AK077067, AK015711; A_SD+122: AK032374, BC017615, AK015277, AK006354; SD+29: BY733866; SD+52: AK080034, AY167972, BG144807, AK006905, AK007235, AK161293, AK132928, AK135585, BB614554, AK016072, AK015559, AK076999, AK015267, AK015548, AK015778, AK015845; SD+106: AK015524; SD+288: AK076828, AK006905, AK015267; SD+350: AK017011; SD+1881: NM_177142; SD+2036: NM_177142; SD+5094: AF529222; BG_SA+4578: insertion in beige gene (for sequence see the online annotation); S_SA+1237: AK040102; BG_SD+4903: AK031201, AK032656; BG_SD+4694: AK134759, AK038418, DV059289, AK015958, AK034994, insertion in beige gene. B. Insertion of L1 GF element in the intron of GBP-5 gene introduced a SA site (SA-154) and resulted in creation of a novel exon coding for the C-terminal and bearing a new stop codon (solid vertical line) (cDNA transcripts GBP-5a, b: gi: 24266664, 26326418).
Figure 4Distribution of transcriptional start sites within a region of antisense sequence of the ORF1. The coordinates are with respect to the start of the ORF1 in L1mda2 sequence (M13002). The Y axis represents the number of cDNAs supporting each transcriptional start site location. In total, 17 cDNAs support the TSs in this region: AK017011, AK076828, AK006905, AK007235, AK015524, AK161293, AK132928, AK135585, AK077067, AK016072, AK015559, AK076999, AK015267, AK015548, AK015778, AK015845, AK015266.
Figure 5Annotation of antisense splice sites in different subfamilies of putatively active L1s and antisense intronic FL L1 insertions. A. For illustration the polypyrimidine tracts for the site SA-154 are shown. Here, the splice donor AG motif is present only in a small fraction of full-length intact elements belonging to the A subfamily (741), whereas it is intact in GF subfamily (190). "01" marks the location of AG splice acceptor motif; "-" designates the position of the polypyrimidine tract. Exemplary Motif1 and Motif2 sequences, containing the functional SA-154 splice site, are evidenced by mapping of cDNAs (AK145348, BG144807, respectively) to the corresponding genomic locations containing L1s (NCBIm35). B. Conservation of antisense GT/AG splice motifs in potentially active L1s. C. Conservation of antisense GT/AG splice motifs in antisense intronic FL L1 insertions. "n = " indicates the number of annotated L1s/monomers. Legend: cDNA-identified splice sites.