| Literature DB >> 28993438 |
Mmatshepho M Phasha1, Brenda D Wingfield2, Martin P A Coetzee2, Quentin C Santana2, Gerda Fourie1, Emma T Steenkamp3.
Abstract
Removal of introns from transcribed RNA represents a crucial step during the production of mRNA in eukaryotes. Available whole-genome sequences and expressed sequence tags (ESTs) have increased our knowledge of this process and revealed various commonalities among eukaryotes. However, certain aspects of intron structure and diversity are taxon-specific, which can complicate the accuracy of in silico gene prediction methods. Using core genes, we evaluated the distribution and architecture of Fusarium circinatum spliceosomal introns, and linked these characteristics to the accuracy of the predicted gene models of the genome of this fungus. We also evaluated intron distribution and architecture in F. verticillioides, F. oxysporum, and F. graminearum, and made comparisons with F. circinatum Results indicated that F. circinatum and the three other Fusarium species have canonical 5' and 3' splice sites, but with subtle differences that are apparently not shared with those of other fungal genera. The polypyrimidine tract of Fusarium introns was also found to be highly divergent among species and genes. Furthermore, the conserved adenosine nucleoside required during the first step of splicing is contained within unique branch site motifs in certain Fusarium introns. Data generated here show that introns of F. circinatum, as well as F. verticillioides, F. oxysporum, and F. graminearum, are characterized by a number of unique features such as the CTHAH and ACCAT motifs of the branch site. Incorporation of such information into genome annotation software will undoubtedly improve the accuracy of gene prediction methods used for Fusarium species and related fungi.Entities:
Keywords: Fusarium; cis-elements; gene prediction; intron splicing; spliceosomal introns
Mesh:
Year: 2017 PMID: 28993438 PMCID: PMC5677156 DOI: 10.1534/g3.117.300344
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Frequency of introns in the coding sequences of 226 core genes in four Fusarium species.
Figure 2Lengths of all introns within a set of 226 coding sequences of four species of Fusarium.
Figure 3The relationship between intron position and intron length in 226 core genes of four Fusarium species. The graphs were plotted with three data points (high, low, and mean intron lengths) on the y-axes for the intron positions on the x-axes. The vertical lines represent the high and low intron lengths and the blue triangles represent the mean values. An ANOVA and Tukey’s Honestly Significant Difference tests showed that the mean lengths for the first-position introns were significantly different from introns in positions 2–7 (P = 0.05).
Figure 4The distribution of introns within the set of 226 core genes of the four Fusarium species. (A) The positions of introns are shown along relative gene length (x-axis), and the frequencies of these introns are depicted on the y-axis. (B) The genes were divided into three regions: the 5′ region [the first third of the coding sequence (CDS)], the middle region (the second third of the CDS), and the 3′ region (the last third of the CDS). Gene categories: 1, all introns are at 5′ region; 2, > 50% of the introns are in the 5′ region; 3, 50% of the introns are in the 5′ region and 50% are in the 3′ region; 4, all introns are in the middle region; 5, > 50% of the introns are in the 3′ region; 6, all introns are in the 3′ region; and 7, introns are evenly distributed across the gene (no concentration of introns at a particular region). The numbers in parentheses are the number of CDSs included per gene category.
Figure 5Consensus sequences at the 5′ splice site, the branch site, and the 3′ splice site constructed using WebLogo 3.3. bits, binary digits.
The length of introns and a summary of the motifs examined in 2022 introns from 226 core genes in four Fusarium species
| Length | 5′ Splice Site Motif | Polypyrimidine Tract | Branch Site Motif | 3′ Splice Site Motif |
|---|---|---|---|---|
| 42–529 nucleotides | A38A38G53|G100T99A74A42G93T66 | 83% located between 5′ splice site and the branch site, 17% located between the branch site and the 3′ splice site | CTRAY (91%) | Y93A100G100 |R59 |
| CTHAH (4.99%) | YAG|R (94.31%) | |||
| TTRAY (3.96%) | RAG|R (3.51%) | |||
| ACCAT (0.05%) | RAG|Y (2.18%) |
Subscript digits following individual bases indicate the proportion (in percentage) of occurrence of the base in that position.
For comparison with the literature, we included the 5′ splice site consensus sequence in this form. However, full details regarding the frequency of specific bases are as follows: N1N2N3|G100YN4N5N6N7, where N1 = A38/G21/T19/C24; N2 = A38/G20/T21/C21; N3 = A15/G53/T15/C1; N3 = A15/G53/T15/C17; N4 = A74/G16/T6/C4; N5 = A42/G4/T22/C32; N6 = A3/G93/T3/C1; N7 = A14/G4/T66/C16; and Y = T99/C1.
The proportion of the introns in which a specific branch site motif was observed is indicated in parentheses. Alternative branch site sequences: CTHAH represents CTTAC, CTCAA, CTAAA, and CTCAT; TTRAY represents TTAAC, TTAAT, TTGAC, and TTGAT*. All the predicted branch site motifs were supported by expressed sequence tags data, except for TTGAT. Within the sequences, R, H, and Y represent standard International Union of Pure and Applied Chemistry codes for degenerate nucleotides, where R represents a nucleotide with either 2′-deoxyguanosine or 2′-deoxyadenosine bases, Y represents either 2′-deoxycytidine or 2′-deoxythymidine bases, and H represents 2′-deoxyadenosine, 2′-deoxycytidine, or 2′-deoxythymidine bases.
The proportion of the introns in which a specific 3′ splice site was observed is indicated in parentheses.