| Literature DB >> 30594132 |
Boas Pucker1,2, Samuel F Brockington3.
Abstract
BACKGROUND: Most eukaryotic genes comprise exons and introns thus requiring the precise removal of introns from pre-mRNAs to enable protein biosynthesis. U2 and U12 spliceosomes catalyze this step by recognizing motifs on the transcript in order to remove the introns. A process which is dependent on precise definition of exon-intron borders by splice sites, which are consequently highly conserved across species. Only very few combinations of terminal dinucleotides are frequently observed at intron ends, dominated by the canonical GT-AG splice sites on the DNA level.Entities:
Keywords: Annotation; Comparative genomics; Evolution; Gene expression; Gene structure; Natural diversity; Splicing; Transcriptomics
Mesh:
Substances:
Year: 2018 PMID: 30594132 PMCID: PMC6310983 DOI: 10.1186/s12864-018-5360-z
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Correlation between splice site sequence divergence and frequency. Spearman correlation coefficient between the splice site combination divergence from the canonical GT-AG and their frequency is r = − 0.4297 (p-value = 7*10− 13)
Fig. 2Splice site combination frequency. The frequencies of selected splice site combinations across 121 plant species are displayed. Splice site combinations with high similarity to the canonical GT-AG or the major non-canonical GC-AG/AT-AC are more frequent than other splice site combinations
Fig. 3Intron length distribution. Length distribution of introns with canonical (green) and non-canonical (red) splice site combinations are displayed. Values of all species are combined in this plot resulting in a consensus curve. Most striking differences are (1) at the intron length peak around 200 bp where non-canonical splice site combinations are less likely and (2) at very long intron lengths where introns with non-canonical splice sites are more likely
Fig. 4Splice site frequency. Occurrences of the canonical GT-AG, the major non-canonical GC-AG and AT-AC as well as the combined occurrences of all minor non-canonical splice sites (others) are displayed. The proportion of GT-AG is about 98.7%. There is some variation, but most species show GC-AG at about 1.2% and AT-AC at 0.06%. All others combined account usually for about 0.09% as well
Fig. 5Usage of splice sites. Usage of splice sites was calculated based on the number of RNA-Seq reads supporting the exon next to a splice site and the number of reads supporting the intron containing the splice site. There is a substantial difference between the usage of 5′ and 3′ splice sites in favor of the 5′ splice sites. Canonical GT-AG splice site combinations are used more often than major or minor non-canonical splice site combinations. Sample size (n) and median (m) of the usage values are given for all splice sites