| Literature DB >> 26109356 |
Madeline A Crosby1, L Sian Gramates2, Gilberto Dos Santos2, Beverley B Matthews2, Susan E St Pierre2, Pinglei Zhou2, Andrew J Schroeder2, Kathleen Falls2, David B Emmert2, Susan M Russo2, William M Gelbart2.
Abstract
In the context of the FlyBase annotated gene models in Drosophila melanogaster, we describe the many exceptional cases we have curated from the literature or identified in the course of FlyBase analysis. These range from atypical but common examples such as dicistronic and polycistronic transcripts, noncanonical splices, trans-spliced transcripts, noncanonical translation starts, and stop-codon readthroughs, to single exceptional cases such as ribosomal frameshifting and HAC1-type intron processing. In FlyBase, exceptional genes and transcripts are flagged with Sequence Ontology terms and/or standardized comments. Because some of the rule-benders create problems for handlers of high-throughput data, we discuss plans for flagging these cases in bulk data downloads.Entities:
Keywords: bicistronic; multiphasic exon; non-AUG translation start; shared promoter; stop-codon suppression
Mesh:
Substances:
Year: 2015 PMID: 26109356 PMCID: PMC4528330 DOI: 10.1534/g3.115.018937
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Gene-associated Sequence Ontology terms
| SO Term | SO ID Number |
|---|---|
| gene_with_dicistronic_mRNA | SO:0000722 |
| gene_with_polycistronic_transcript | SO:0000690 |
| gene_with_trans_spliced_transcript | SO:0000459 |
| gene_with_unconventional_translation_start_codon | SO:0001739 |
| gene_with_translation_start_codon_CUG | SO:0001740 |
| gene_with_stop_codon_redefined_as_selenocysteine | SO:0000710 |
| gene_with_stop_codon_read_through | SO:0000697 |
| gene_with_transcript_with_translational_frameshift | SO:0000712 |
Proposed transcript-associated flags to be included in FASTA files
| Proposed Flag | Type |
|---|---|
| dicistronic_mRNA | Transcript exception |
| polycistronic_transcript | Transcript exception |
| non_canonical_splice_site | Transcript exception |
| endonuclease_spliced_intron | Transcript exception |
| trans_spliced_transcript | Transcript exception |
| non-canonical_start_codon | Translation exception |
| stop_codon_redefined_as_selenocysteine | Translation exception |
| stop_codon_read_through | Translation exception |
| transcript_with_translational_frameshift | Translation exception |
| mitochondrial_genetic_code | Translation exception |
| mitochondrial_incomplete_stop_codon | Translation exception |
| start_codon_not_determined | Translation exception |
| mutation in strain | Sequence alteration |
| genomic sequence error or gap | Sequence alteration |
Figure 1A dicistronic transcript isoform for and is produced from a stage- and tissue-specific promoter. A GBrowse view showing (top to bottom): the gene extents and the gene models; cDNAs and ESTs; transcription start site(s); unstranded RNA-Seq coverage data corresponding to a developmental series (early embryos, top, to adults, bottom); and stranded RNA-Seq coverage data (plus strand top, minus strand bottom) corresponding to testis (red), male accessory gland (magenta), ovary from virgin females (orange), and ovaries from mated females (tan). More information on data presented in GBrowse may be found at http://flybase.org/wiki/FlyBase:GBrowse_Tracks#General.
Introns with noncanonical splice sites and/or U12-type 5′ consensus sequence
| Splice Donor-Acceptor Pair | Number in Release 6.04 | Number with RNA-Seq Junction Support | Number with Similar Alternative Splice | Within Coding | Within 5′ UTR | Within lncRNA |
|---|---|---|---|---|---|---|
| AT-AC (U12) | 9 | 9 | 1 | 9 | 0 | 0 |
| AT-AC (U2) | 4 | 4 | 2 | 4 | 0 | 0 |
| GT-TG | 23 | 19 | 22 | 13 | 9 | 1 |
| GT-GG | 6 | 5 | 5 | 3 | 2 | 1 |
| GT-CG | 8 | 8 | 8 | 6 | 2 | 0 |
| GT-AT | 14 | 11 | 14 | 11 | 3 | 0 |
| GT-AA | 3 | 3 | 3 | 2 | 1 | 0 |
| GA-AG | 12 | 12 | 5 | 8 | 3 | 1 |
| GG-AG | 0 | — | — | — | — | — |
| GT-AC | 0 | — | — | — | — | — |
| GT-AG (U12) | 10 | 9 | 9 | 1 | 0 | |
| GC-AG (U12) | 1 | 1 | 1 | 0 | 0 |
Figure 2Noncanonical splices supported by RNA-Seq junction data. (A) Of three alternative splice acceptors for intron 6 of the gene, two are noncanonical TGs, including the splice acceptor used at the highest frequency (first highlighted junction). A GBrowse view showing (top to bottom): nucleotide sequence; region of the gene model showing one intron/exon boundary; EST data; RNA-Seq junction data; and unstranded RNA-Seq coverage data corresponding to a developmental series (early embryos, top, to adults, bottom). More information on data presented in GBrowse may be found at http://flybase.org/wiki/FlyBase:GBrowse_Tracks#General. (B) Report for an RNA-Seq junction that corresponds to a noncanonical splice but is aligned to incorrect noncanonical sites, one of several cases that were slightly misaligned.
Figure 3Noncanonical terminal extensions of the CDS. (A) CUG start codon in results in a 48-aa N-terminal extension; a GBrowse view showing amino acid sequence and amino ends of annotated polypeptides. Use of this alternative start codon has been confirmed by Western blot, mutagenesis of reported constructs, and rescue constructs (Beerman and Jongens 2011). (B) For the gene model, a stop-codon readthrough annotated for dan-RB is supported by PhyloCSF analysis (conservation of protein signatures). A GBrowse view showing (top to bottom): the gene model; stop codons on the plus strand in each of the three open reading frames; and regions of protein conservation among the Drosophila species (tan extents at the bottom). More information on data presented in GBrowse may be found at http://flybase.org/wiki/FlyBase:GBrowse_Tracks#General.