Literature DB >> 29079681

Noncanonical GA and GG 5' Intron Donor Splice Sites Are Common in the Copepod Eurytemora affinis.

Hugh M Robertson1.   

Abstract

The noncanonical 5' intron donor splice sites GA and GG are exceedingly rare in described eukaryotic genomes; however, they are present in ∼12% of introns in the genome of the copepod Eurytemora affinis Failure to recognize the high frequency of these donor sites compromised the modeling of genes in this newly sequenced genome, including 10 conserved ionotropic glutamate receptor (GluR) family genes curated herein. These introns appear to have been acquired recently, along with many additional idiosyncratic introns. Their high frequency implies the evolution of modified intron donor splice site recognition in this copepod.
Copyright © 2017 Robertson.

Entities:  

Keywords:  Eurytemora; GA donors; copepod genome; intron evolution; noncanonical intron donor splice sites

Mesh:

Substances:

Year:  2017        PMID: 29079681      PMCID: PMC5714493          DOI: 10.1534/g3.117.300189

Source DB:  PubMed          Journal:  G3 (Bethesda)        ISSN: 2160-1836            Impact factor:   3.154


The canonical sequence for 5′ intron donor splice sites in eukaryotes has an obligate G to start the intron followed by U, or occasionally C, as part of a larger consensus splice site sequence, with DNA sequence AG/GTAAGT (Mount 1982). For example, among 222,263 introns in the human genome, Parada found only 184 noncanonical splice sites, of which only 14 were GA and 32 were GG donors (0.006 and 0.014%, respectively). Most of these are sites of alternative splicing, as are the most intensively studied GA donors, belonging to the vertebrate fibroblast growth factor receptor 1–3 genes (Brackenridge ). Eyun examined the evolution of arthropod chemosensory genes, focusing on crustaceans and particularly the genome of the copepod Eurytemora affinis, one of the first copepod genomes to be sequenced. Among others, the authors reported nine Ionotropic Receptor (IR) genes and five members of the ionotropic GluR family, from which the IRs evolved (Benton ); however, the amino acid sequences they provided for these proteins are almost all truncated at one or both ends. I have completed the gene models for five conserved members of the IR family and the five GluR family members, and found that they contain an unusually high frequency of noncanonical GA and GG 5′ intron donor splice sites. A sample of 26 other large genes indicates that this unusual phenomenon is likely to be genome-wide.

Materials and Methods

Gene models were built manually in the text editor TEXTWRANGLER using genomic sequences from the assembly published in Eyun and presented in the i5k Workspace@NAL genome browser (Poelchau ; https://i5k.nal.usda.gov/). RNAseq reads spanning each intron were either obtained from the i5k genome browser if there was RNAseq mapped across an intron, or from the Short Read Archive (SRA) at the National Center for Biotechnology Information (NCBI) using BLASTN searches with flanking exon sequences as queries and default parameters. Exons missing in assembly gaps were recreated using raw genome reads from the SRA, using the RNAseq reads obtained above as queries. Models were compared with the protein sequences reported by Eyun ; however, these appear to have been derived from a transcriptome rather than from the genome itself, because they contain descriptors that indicate they were derived from transcripts from CUFFLINKS or another transcriptome assembly (e.g., EaffNMDAR2-1 comp31752_c0_seq1 CUFF.1933.3). They also commonly span both GA and GG donor introns as well as misassembled exons and exons missing from the assembly (see, for example, description of IR25a in Supplemental Material, File S1). Sequence logos for exon–intron junctions were built at http://weblogo.berkeley.edu/logo.cgi.

Data availability

The sequences of the 10 focus proteins are presented in File S1, as are the nucleotide sequences surrounding the 36 noncanonical intron 5′ splice sites.

Results and Discussion

The five conserved members of the IR family (IR8a, 21a, 25a, 76b, and 93a) and the five GluR members (GluR1 and 2, and NMDAR1, 2-1, and 2-2) were successfully extended by employing RNAseq and genome reads to full-length genes (sequences available in File S1). Full-length status was confirmed by comparison with orthologs from other insects (e.g., Benton ; Croset ; Terrapon ; Ioannidis ), the crustaceans Daphnia pulex (Croset ) and Hyalella azteca (H. M. Robertson, unpublished data), and a tick and mite (Gulia-Nuss ; Hoy ), as well as in BLASTP searches of the nonredundant protein database at NCBI. Some of the problems with these gene models are the result of typical difficulties with draft genome sequences, including exons in separate scaffolds or misassembled contigs, and exons partially or completely missing in intrascaffold sequence gaps in the assembly. However, in many instances, the difficulties result from the presence of noncanonical 5′ intron donor splice sites, specifically GA and GG donor sites. Spliced RNAseq reads are not shown for these introns in the Apollo genome browser at the i5k Workspace@NAL where this genome is displayed (https://i5k.nal.usda.gov/), because their mapping algorithm would not accept such unusual noncanonical splice donors, and the genes were only partially modeled by the MAKER pipeline that the authors employed for similar reasons, in addition to misassembled and missing exons. These are all long genes (19–123 kb) with large numbers of mostly short exons (19–47), providing a reasonably large sample of 297 introns (Table 1). Of these 297 introns, including an alternative-splicing arrangement in IR25a, 31 have GA donors while five have GG donors (12%). The sequences of these 36 noncanonical donors are provided in File S1, and sequence logos of them and the canonical GT and rare GC donors are shown in Figure 1, along with corresponding logos for the 3′ splice acceptor sites of these introns.
Table 1

Features of 10 conserved IR and related ionotropic GluR genes and proteins in E. affinis

GeneScaffold#Length (bp)ExonsModelsAmino AcidsGA/GG Donors
IR8aF6833,279 (26,505)33 (23)3874 (569)5/0
IR21a333,150 (16,252)23 (13)2763 (493)4/1
IR25aF26990,005 (90,005)34 (34)5907 (907)4/2
IR76bF53218,715 (27,613)19 (15)2480 (417)1/0
IR93a3518,850 (15,580)20 (19)1942 (935)0/1
GluR17769,907 (30,397)25 (13)2960 (449)3/0
GluR2524,096 (14,472)23 (14)2923 (615)2/0
NMDAR1F4363,037 (31,051)36 (20)41095 (736)2/0
NMDAR2-1101123,322 (102,094)37 (28)81055 (847)6/1
NMDAR2-2F14186,450 (76,362)47 (38)41030 (823)4/0

Lengths are from start to stop codon in large scaffolds, excluding exons present on short separate scaffolds or those that were built de novo, both of which presumably belong in sequence gaps in the large scaffolds, the lengths of which are included in these counts. Only coding exons are included (IR76b and NMDAR2-2 have single noncoding 5′ exons). Models are the number of models in the automated gene set available at the i5k Workspace@NAL genome browser (EAFF_v0.5.3), and usually not all exons are modeled. Numbers in parentheses are for the proteins reported in Eyun . Suffix “F” after gene name indicates that the genome assembly had to be repaired for a complete gene model to be built (details of each gene model are provided in File S1).

Figure 1

Sequence logos showing information content for the 36 noncanonical GA and GG 5′ intron donor splice sites and the 3′ acceptor sites for these introns, compared with sites for 261 introns with canonical donors in 10 conserved large ionotropic glutamate receptor family genes in the copepod E. affinis. (A) Ten bases of exon and 13 bases of intron sequence are shown for the donors. (B) Sixteen bases of intron and seven bases of exon sequence are shown for the acceptors. Sequence logos for frequencies of nucleotides are shown in Figure S1.

Lengths are from start to stop codon in large scaffolds, excluding exons present on short separate scaffolds or those that were built de novo, both of which presumably belong in sequence gaps in the large scaffolds, the lengths of which are included in these counts. Only coding exons are included (IR76b and NMDAR2-2 have single noncoding 5′ exons). Models are the number of models in the automated gene set available at the i5k Workspace@NAL genome browser (EAFF_v0.5.3), and usually not all exons are modeled. Numbers in parentheses are for the proteins reported in Eyun . Suffix “F” after gene name indicates that the genome assembly had to be repaired for a complete gene model to be built (details of each gene model are provided in File S1). Sequence logos showing information content for the 36 noncanonical GA and GG 5′ intron donor splice sites and the 3′ acceptor sites for these introns, compared with sites for 261 introns with canonical donors in 10 conserved large ionotropic glutamate receptor family genes in the copepod E. affinis. (A) Ten bases of exon and 13 bases of intron sequence are shown for the donors. (B) Sixteen bases of intron and seven bases of exon sequence are shown for the acceptors. Sequence logos for frequencies of nucleotides are shown in Figure S1. These GA and GG donors differ significantly from canonical donors in having AG as obligate nucleotides at the end of the preceding exon, regardless of the phase of the intron with respect to the reading frame, which is the consensus sequence for canonical 5′ donor splice sites, but is not obligate for them (Figure 1A). Almost all have A in the third position of the intron, which again is the consensus for canonical sites, but not nearly obligate. The 3′ acceptor sites do not differ as greatly between introns with these noncanonical donors vs. canonical donors (Figure 1B). With 12% of introns in these 10 genes having these noncanonical donor sites, the automated modeling of these long genes was compromised, and most are partial models with many exons not modeled at all (Table 1). Once this pattern of absence of spliced RNAseq reads in the genome browser and likely GA or GG donor sites was recognized, many of these introns could be seen in a wide variety of large genes with deep RNAseq coverage in the genome browser. Full-length models were built in the Apollo browser for 26 such large genes containing GA and GG donors and encoding a wide variety of proteins (Table S1), indicating that this phenomenon is likely to be genome-wide. These 26 genes, while not a random sample, contain 623 introns interrupting their coding sequences, of which 54 have GA donors and 10 have GG donors (10% noncanonical donors), frequencies comparable with those of the 10 focus genes above. GA and GG donors have not been reported at anything like 10–12% of introns in any other eukaryote genome. Their recognition will greatly improve gene modeling in this copepod genome, but would require acceptance of only G, instead of GY, as the obligate 5′ intron donor splice site nucleotide. The i5k pilot consortium (https://www.hgsc.bcm.edu/arthropods/i5k) has also sequenced the genome of another copepod, Tigriopus californicus (available in the i5k Workspace@NAL genome browser), and a genome sequence for the related T. kingsejongensis has been published (Kang ). Neither genome evidences any GA or GG 5′ donor splice sites in homologs of these 10 receptor genes; indeed, these genes are generally far shorter with far fewer introns in these copepods, so this high frequency of GA and GG donors is restricted for now to this Eurytemora copepod. Alignment of the positions and phases of these GA and GG introns for the five ionotropic receptors with those of homologous genes in T. californicus, as well as two other available crustaceans, H. azteca (also available from the i5k pilot consortium) and D. pulex (Croset ), as well as various insects (e.g., Benton ; Terrapon ; Ioannidis ) and other arthropods like a tick and mite (Gulia-Nuss ; Hoy ), reveals that they are unique introns in Eurytemora, along with many additional idiosyncratic introns with canonical donors. For example, of the 33 introns in IR25a in E. affinis, nine are shared with other arthropods and the remaining 24, including the six GA or GG donors, are idiosyncratic to it. It appears that this copepod underwent an explosion of intron gains, including those with noncanonical donors. Alternative splicing of the GA donor sites in the vertebrate fibroblast growth factor receptor 1–3 genes is a complicated process involving a nearby consensus sequence (Brackenridge ), but no such sequence was noticed in these copepod GA or GG donor introns, which are also not alternatively spliced but rather required for their genes to encode full-length proteins. The high frequency of these noncanonical donors implies the evolution of modified 5′ intron donor site recognition in this copepod. The only intact U1 snRNA in the genome assembly has the same highly conserved 5′ end with sequence complementing the 5′ intron donor splice consensus of AG/GTAAGT common to animals; however, recognition of the 5′ donor site is affected by other components of the snRNPs, so it is unclear how these noncanonical donor sites are recognized.

Supplementary Material

Supplemental material is available online at www.g3journal.org/lookup/suppl/doi:10.1534/g3.117.300189/-/DC1. Click here for additional data file. Click here for additional data file. Click here for additional data file.
  12 in total

1.  Efficient use of a 'dead-end' GA 5' splice site in the human fibroblast growth factor receptor genes.

Authors:  Simon Brackenridge; Andrew O M Wilkie; Gavin R Screaton
Journal:  EMBO J       Date:  2003-04-01       Impact factor: 11.598

2.  A catalogue of splice junction sequences.

Authors:  S M Mount
Journal:  Nucleic Acids Res       Date:  1982-01-22       Impact factor: 16.971

3.  The i5k Workspace@NAL--enabling genomic data access, visualization and curation of arthropod genomes.

Authors:  Monica Poelchau; Christopher Childers; Gary Moore; Vijaya Tsavatapalli; Jay Evans; Chien-Yueh Lee; Han Lin; Jun-Wei Lin; Kevin Hackett
Journal:  Nucleic Acids Res       Date:  2014-10-20       Impact factor: 16.971

4.  Ancient protostome origin of chemosensory ionotropic glutamate receptors and the evolution of insect taste and olfaction.

Authors:  Vincent Croset; Raphael Rytz; Scott F Cummins; Aidan Budd; David Brawand; Henrik Kaessmann; Toby J Gibson; Richard Benton
Journal:  PLoS Genet       Date:  2010-08-19       Impact factor: 5.917

5.  A comprehensive survey of non-canonical splice sites in the human transcriptome.

Authors:  Guillermo E Parada; Roberto Munita; Cledi A Cerda; Katia Gysling
Journal:  Nucleic Acids Res       Date:  2014-08-14       Impact factor: 16.971

6.  Genomic insights into the Ixodes scapularis tick vector of Lyme disease.

Authors:  Monika Gulia-Nuss; Andrew B Nuss; Jason M Meyer; Daniel E Sonenshine; R Michael Roe; Robert M Waterhouse; David B Sattelle; José de la Fuente; Jose M Ribeiro; Karine Megy; Jyothi Thimmapuram; Jason R Miller; Brian P Walenz; Sergey Koren; Jessica B Hostetler; Mathangi Thiagarajan; Vinita S Joardar; Linda I Hannick; Shelby Bidwell; Martin P Hammond; Sarah Young; Qiandong Zeng; Jenica L Abrudan; Francisca C Almeida; Nieves Ayllón; Ketaki Bhide; Brooke W Bissinger; Elena Bonzon-Kulichenko; Steven D Buckingham; Daniel R Caffrey; Melissa J Caimano; Vincent Croset; Timothy Driscoll; Don Gilbert; Joseph J Gillespie; Gloria I Giraldo-Calderón; Jeffrey M Grabowski; David Jiang; Sayed M S Khalil; Donghun Kim; Katherine M Kocan; Juraj Koči; Richard J Kuhn; Timothy J Kurtti; Kristin Lees; Emma G Lang; Ryan C Kennedy; Hyeogsun Kwon; Rushika Perera; Yumin Qi; Justin D Radolf; Joyce M Sakamoto; Alejandro Sánchez-Gracia; Maiara S Severo; Neal Silverman; Ladislav Šimo; Marta Tojo; Cristian Tornador; Janice P Van Zee; Jesús Vázquez; Filipe G Vieira; Margarita Villar; Adam R Wespiser; Yunlong Yang; Jiwei Zhu; Peter Arensburger; Patricia V Pietrantonio; Stephen C Barker; Renfu Shao; Evgeny M Zdobnov; Frank Hauser; Cornelis J P Grimmelikhuijzen; Yoonseong Park; Julio Rozas; Richard Benton; Joao H F Pedra; David R Nelson; Maria F Unger; Jose M C Tubio; Zhijian Tu; Hugh M Robertson; Martin Shumway; Granger Sutton; Jennifer R Wortman; Daniel Lawson; Stephen K Wikel; Vishvanath M Nene; Claire M Fraser; Frank H Collins; Bruce Birren; Karen E Nelson; Elisabet Caler; Catherine A Hill
Journal:  Nat Commun       Date:  2016-02-09       Impact factor: 14.919

7.  The genome of the Antarctic-endemic copepod, Tigriopus kingsejongensis.

Authors:  Seunghyun Kang; Do-Hwan Ahn; Jun Hyuck Lee; Sung Gu Lee; Seung Chul Shin; Jungeun Lee; Gi-Sik Min; Hyoungseok Lee; Hyun-Woo Kim; Sanghee Kim; Hyun Park
Journal:  Gigascience       Date:  2017-01-01       Impact factor: 6.524

8.  Evolutionary History of Chemosensory-Related Gene Families across the Arthropoda.

Authors:  Seong-Il Eyun; Ho Young Soh; Marijan Posavi; James B Munro; Daniel S T Hughes; Shwetha C Murali; Jiaxin Qu; Shannon Dugan; Sandra L Lee; Hsu Chao; Huyen Dinh; Yi Han; HarshaVardhan Doddapaneni; Kim C Worley; Donna M Muzny; Eun-Ok Park; Joana C Silva; Richard A Gibbs; Stephen Richards; Carol Eunmi Lee
Journal:  Mol Biol Evol       Date:  2017-08-01       Impact factor: 16.240

9.  Genome Sequencing of the Phytoseiid Predatory Mite Metaseiulus occidentalis Reveals Completely Atomized Hox Genes and Superdynamic Intron Evolution.

Authors:  Marjorie A Hoy; Robert M Waterhouse; Ke Wu; Alden S Estep; Panagiotis Ioannidis; William J Palmer; Aaron F Pomerantz; Felipe A Simão; Jainy Thomas; Francis M Jiggins; Terence D Murphy; Ellen J Pritham; Hugh M Robertson; Evgeny M Zdobnov; Richard A Gibbs; Stephen Richards
Journal:  Genome Biol Evol       Date:  2016-06-27       Impact factor: 3.416

10.  Variant ionotropic glutamate receptors as chemosensory receptors in Drosophila.

Authors:  Richard Benton; Kirsten S Vannice; Carolina Gomez-Diaz; Leslie B Vosshall
Journal:  Cell       Date:  2009-01-09       Impact factor: 41.582

View more
  4 in total

1.  The Toxicogenome of Hyalella azteca: A Model for Sediment Ecotoxicology and Evolutionary Toxicology.

Authors:  Helen C Poynton; Simone Hasenbein; Joshua B Benoit; Maria S Sepulveda; Monica F Poelchau; Daniel S T Hughes; Shwetha C Murali; Shuai Chen; Karl M Glastad; Michael A D Goodisman; John H Werren; Joseph H Vineis; Jennifer L Bowen; Markus Friedrich; Jeffery Jones; Hugh M Robertson; René Feyereisen; Alexandra Mechler-Hickson; Nicholas Mathers; Carol Eunmi Lee; John K Colbourne; Adam Biales; J Spencer Johnston; Gary A Wellborn; Andrew J Rosendale; Andrew G Cridge; Monica C Munoz-Torres; Peter A Bain; Austin R Manny; Kaley M Major; Faith N Lambert; Chris D Vulpe; Padrig Tuck; Bonnie J Blalock; Yu-Yu Lin; Mark E Smith; Hugo Ochoa-Acuña; Mei-Ju May Chen; Christopher P Childers; Jiaxin Qu; Shannon Dugan; Sandra L Lee; Hsu Chao; Huyen Dinh; Yi Han; HarshaVardhan Doddapaneni; Kim C Worley; Donna M Muzny; Richard A Gibbs; Stephen Richards
Journal:  Environ Sci Technol       Date:  2018-04-24       Impact factor: 9.028

2.  Arginine- but not alanine-rich carboxy-termini trigger nuclear translocation of mutant keratin 10 in ichthyosis with confetti.

Authors:  Patricia Renz; Elias Imahorn; Iris Spoerri; Magomet Aushev; Oliver P March; Hedwig Wariwoda; Sarah Von Arb; Andreas Volz; Peter H Itin; Julia Reichelt; Bettina Burger
Journal:  J Cell Mol Med       Date:  2019-10-22       Impact factor: 5.310

3.  Repertoire-wide gene structure analyses: a case study comparing automatically predicted and manually annotated gene models.

Authors:  Jeanne Wilbrandt; Bernhard Misof; Kristen A Panfilio; Oliver Niehuis
Journal:  BMC Genomics       Date:  2019-10-17       Impact factor: 3.969

4.  Animal, Fungi, and Plant Genome Sequences Harbor Different Non-Canonical Splice Sites.

Authors:  Katharina Frey; Boas Pucker
Journal:  Cells       Date:  2020-02-18       Impact factor: 6.600

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.