Literature DB >> 21586533

In silico analysis of 3'-end-processing signals in Aspergillus oryzae using expressed sequence tags and genomic sequencing data.

Mizuki Tanaka1, Yoshifumi Sakai, Osamu Yamada, Takahiro Shintani, Katsuya Gomi.   

Abstract

To investigate 3'-end-processing signals in Aspergillus oryzae, we created a nucleotide sequence data set of the 3'-untranslated region (3' UTR) plus 100 nucleotides (nt) sequence downstream of the poly(A) site using A. oryzae expressed sequence tags and genomic sequencing data. This data set comprised 1065 sequences derived from 1042 unique genes. The average 3' UTR length in A. oryzae was 241 nt, which is greater than that in yeast but similar to that in plants. The 3' UTR and 100 nt sequence downstream of the poly(A) site is notably U-rich, while the region located 15-30 nt upstream of the poly(A) site is markedly A-rich. The most frequently found hexanucleotide in this A-rich region is AAUGAA, although this sequence accounts for only 6% of all transcripts. These data suggested that A. oryzae has no highly conserved sequence element equivalent to AAUAAA, a mammalian polyadenylation signal. We identified that putative 3'-end-processing signals in A. oryzae, while less well conserved than those in mammals, comprised four sequence elements: the furthest upstream U-rich element, A-rich sequence, cleavage site, and downstream U-rich element flanking the cleavage site. Although these putative 3'-end-processing signals are similar to those in yeast and plants, some notable differences exist between them.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21586533      PMCID: PMC3111234          DOI: 10.1093/dnares/dsr011

Source DB:  PubMed          Journal:  DNA Res        ISSN: 1340-2838            Impact factor:   4.458


Introduction

In eukaryotes, most mRNAs have a poly(A) tail at their 3′ end. 3′-end-processing of eukaryotic pre-mRNA involves endonucleolytic cleavage and polyadenylation.[1-3] The 3′-end cleavage and polyadenylation site are regulated by several sequence elements, which have been extensively studied in mammalian, yeast, and plant cells.[4-8] In mammals, three elements are known as primary sequence elements: the polyadenylation signal, cleavage site, and downstream U/GU-rich elements. In addition, two auxiliary sequence elements (upstream U-rich elements and downstream G-rich elements) have also been identified. Among these elements, the polyadenylation signal, which is the hexanucleotide AAUAAA or its variant AUUAAA, located 10–35 nucleotides (nt) upstream of the poly(A) site is the most highly conserved sequence. In yeast and plants, A-rich sequence elements also exist ∼10–30 nt upstream of the cleavage site, but these elements are less well conserved compared with mammalian polyadenylation signals. Among many sequences identified as A-rich sequences, AAUAAA is the most well-conserved sequence in both yeast and plants. In addition to A-rich sequence elements, further upstream elements, designated efficiency elements in yeast or far upstream elements in plants, the cleavage site, and the downstream U-rich element flanking the cleavage site have been described. In Japan, the filamentous fungus Aspergillus oryzae has long been used for the production of traditional fermented foods, such as sake, soy sauce, and miso (soybean paste), and its long history of use in the food industry is a testament to its safety.[9] In addition, A. oryzae has the ability to secrete large amounts of protein, and therefore, it has recently gained recognition as a favourable host organism for recombinant protein production.[10,11] However, secretion yields of heterologous proteins from A. oryzae are low compared with those of homologous proteins or proteins from closely related fungal species.[12] Recently, we revealed that the transcript of a heterologous gene containing the AT-biased codon was prematurely polyadenylated within the coding region of A. oryzae.[13] This premature polyadenylation was prevented by the alteration of its codon to better suit Aspergillus codon usage. This result suggested that cryptic 3′-end-processing signals are recognized by A. oryzae within the coding region of the heterologous gene and that these signals are eliminated by codon optimization. However, no experimental data exist on 3′-end-processing signals in filamentous fungi, including A. oryzae. To elucidate 3′-end-processing signals in A. oryzae, we created a nucleotide sequence data set of the 3′-untranslated region (3′ UTR) and 100 nt downstream of the poly(A) site using A. oryzae expressed sequence tags (ESTs) and genomic sequencing data. Using this data set, we identified several putative 3′-end-processing signals in A. oryzae. To our knowledge, this is the first report of the identification of 3′-end-processing signals in filamentous fungi.

Materials and methods

Creation of the A. oryzae poly(A) data set

A total of 21 446 EST sequences in the A. oryzae EST database (http://nribf2.nrib.go.jp/EST2/index.html), created by sequencing from the 5′ end of the cDNA insert,[14] were searched for sequences that contained at least eight consecutive A residues, yielding 1647 EST sequence entries. Subsequently, EST sequences containing oligo(A) stretches inherently present in the genome were eliminated by comparison with genomic DNA sequences.[15] In addition, to eliminate mis-annotated genes, only EST sequences in which the poly(A) site was located within 1000 nt downstream of the stop codon were selected. EST sequences with poly(A) sites located within the coding region, probably caused by internal priming, were also eliminated in this manner. Nine pairs of redundant EST sequences with identical poly(A) sites were considered to be derived from a single cDNA and were removed from the data set. Finally, 1065 EST sequences were selected by these processes. Genomic DNA-based sequences within the 3′ UTR and 100 nt sequence downstream of the poly(A) site were extracted for the poly(A) site data set (T residues were converted to U residues). This data set contained 22 pairs of EST sequences derived from the same gene with different poly(A) sites. Therefore, this data set comprised only sequences derived from 1043 unique genes. The poly(A) site was designated as the last nucleotide in the genome sequence preceding the poly(A) tail. When an adenine residue was found at the poly(A) site in the genome sequence, this adenine was termed the poly(A) site nucleotide according to the recent reports of 3′-end-processing signals in plants and Chlamydomonas on the basis of EST sequencing data,[8,16] indicating that the first adenine of a poly(A) tail tended to be transcribed from the genomic DNA.[17-19]

DNA microarray analysis

The A. oryzae wild-type strain RIB40, which was used for genome sequencing analysis,[15] was grown in sterilized wheat bran media (3.0 g wheat bran with 1.8 ml distilled water) at 30°C for 33 h. Total RNA extraction, mRNA preparation, and DNA microarray analysis were performed according to the methods of Tamano et al.[20] Purified Cy3- or Cy5-labelled cDNA probes were hybridized using 12 K A. oryzae oligonucleotide microarrays (Fermlab, Tokyo, Japan). After global normalization, the relative fluorescence intensity of each gene was normalized to that of the histone H4 gene, which was used as a reference. We selected 5384 genes whose intensities were found to be reproducible and reliable (P < 0.1) in dye-swap experiments.

Generation of sequence logos

Sequence logos of around the poly(A) site were generated using the enoLOGOS web tool.[21]

Analysis of oligonucleotide frequencies

A standard score (Z-score) was used to detect the most over-represented hexanucleotide sequences from −30 to −15 nt (region II), according to the zeroth- and first-order Markov chain models.[22] The Z-score of a hexanucleotide sequence (w = x1 x2 x3 x4 x5 x6, where x is the nucleotide sequence) was calculated as follows: In this definition, fobs(w) denotes the observed frequency of w, i.e. the number of occurrences of w in s divided by the number of occurrences of sequences having the same length as w in s, where s ranges over all sequences of length 6 located in region II; fexp(w) denotes the expected frequency of w, determined as the value fobs(x1) × fobs(x2) × fobs(x3) × fobs(x4) × fobs(x5) × fobs(x6) in the zeroth-order model or fobs(x1 x2) × fobs(x2 x3) × fobs(x3 x4) × fobs(x4 x5) × fobs(x5 x6)/(fobs(x2) × fobs(x3) × fobs(x4) × fobs(x5) in the first-order model; and n denotes the number of sequences of length 6 located in region II.

Search for protein factors involved in pre-mRNA 3′-end-processing in A. oryzae

Homologs of protein factors involved in eukaryotic pre-mRNA 3′-end-processing were retrieved by searching the A. oryzae genome database (http://www.bio.nite.go.jp/dogan/project/view/AO, http://nribf2.nrib.go.jp/) using the BlastP program.

Results and discussion

Profile of the A. oryzae poly(A) data set

We obtained 1065 sequences for the A. oryzae poly(A) data set from the EST database, as described in the Materials and methods section. First, 1043 unique genes contained in the A. oryzae poly(A) data set were classified into functional categories known as eukaryotic orthologous groups,[23] according to the gene list in the A. oryzae genome database. Compared with the classification of all genes found in the genome database, the number of genes classified into the Unannotated category was markedly lower in the poly(A) data set [43% in the genome database vs. 30% in the poly(A) data set]. In contrast, the number of genes classified into Information storage and processing and Cellular processes and signalling categories was higher in the poly(A) data set [7 and 12% in the genome database vs. 14 and 20% in the poly(A) data set, respectively]. The number of genes classified into Metabolism and Poorly characterized categories was similar between the genome database and poly(A) data set. These results indicated that the poly(A) data set covers a wide range of genes classified into diverse functional categories despite the poly(A) data set comprising only 1043 unique genes of the 12 074 genes predicted in the A. oryzae genome database. In contrast, because EST sequences were accumulated by single-pass sequencing of the 5′ end of the cDNA insert,[14] the poly(A) data set could cover <10% of the total genes, and thus, the poly(A) data set might show some bias towards highly expressed genes. To assess this possibility, we compared the frequency distributions of EST contigs that corresponded to each of the 1043 genes in the poly(A) data set with those of the total 7589 contigs in the EST database[14] (Fig. 1A). Whereas contigs with frequencies of >6 accounted for ∼10% of the total EST contigs, contigs with corresponding frequencies accounted for ∼25% of the poly(A) data set. However, singletons accounted for 40% of the poly(A) data set. In addition, we examined the expression levels of genes in the poly(A) data set by DNA microarray analysis (Fig. 1B). Of the total of 5384 genes selected by microarray analysis, the number of relatively highly expressed genes (expression ratio > 0.1) accounted for approximately 7%, whereas it accounted for 20% of the 618 genes of the poly(A) data set. The remaining genes (80%) were expressed at low levels. These results suggested that the poly(A) data set was somewhat biased towards highly expressed genes, but this fact enabled the identification of 3′-end-processing signals.
Figure 1.

Profile of the A. oryzae poly(A) data set. (A) Frequency distribution of EST contigs based on the EST copy number. The EST copy number of each contig contained in the A. oryzae poly(A) data set was obtained from the A. oryzae EST database (http://nribf2.nrib.go.jp/EST2/index.html). Data on the total EST contigs were obtained from the study by Akao et al.[14] (B) Gene expression levels determined by DNA microarray analysis. The fluorescence intensity of each gene was normalized to that of the histone H4 gene.

Profile of the A. oryzae poly(A) data set. (A) Frequency distribution of EST contigs based on the EST copy number. The EST copy number of each contig contained in the A. oryzae poly(A) data set was obtained from the A. oryzae EST database (http://nribf2.nrib.go.jp/EST2/index.html). Data on the total EST contigs were obtained from the study by Akao et al.[14] (B) Gene expression levels determined by DNA microarray analysis. The fluorescence intensity of each gene was normalized to that of the histone H4 gene. In eukaryotes, many genes including >50% of human and rice genes have multiple polyadenylation sites.[8,24] This alternative polyadenylation has been recognized as an important mechanism for gene expression regulation.[25] However, no study has investigated on alternative polyadenylation on the basis of bioinformatics analyses in filamentous fungi. Although only 22 pairs of duplicated EST sequences with alternative poly(A) sites were included in the poly(A) data set, 14 pairs of these sequences had distant poly(A) sites located at least 30 nt apart (Supplementary Table S1). This result suggested that alternative polyadenylation also generally occurs in filamentous fungi.

Analysis of 3′ UTR length and sequence elements of 3′ UTR-binding proteins

In eukaryotes, 3′ UTR regulates mRNA stability and translational efficiency through sequence elements for 3′ UTR-binding proteins and microRNAs or through its length.[26-31] In A. nidulans, stability of transcripts involved in nitrogen metabolism was dependent on their 3′ UTRs.[32,33] Therefore, the 3′ UTRs may play an important role in gene expression regulation in filamentous fungi. However, no comprehensive information exists about 3′ UTRs in filamentous fungi. Hence, we analysed the distribution of 3′ UTR lengths in A. oryzae and determined their average and median lengths to compare with those in yeast and plants, which were also determined by analysis of EST sequencing data. In A. oryzae, 3′ UTR lengths were predominantly distributed in the range of 51 to 350 nt (Fig. 2). The average 3′ UTR length in A. oryzae was 241 nt, while the median 3′ UTR length was 203 nt. The average 3′ UTR length in Saccharomyces cerevisiae is 144 nt (median 3′ UTR length is 121 nt)[5] and that in plants is 289 nt (Oryza sativa) and 223 nt (Arabidopsis thaliana).[8] These results suggested that 3′ UTR length in A. oryzae is greater than that in yeast but similar to that in plants.
Figure 2.

Distribution of 3′ UTR lengths determined for 1065 unique EST sequences. The average length is 241 nt.

Distribution of 3′ UTR lengths determined for 1065 unique EST sequences. The average length is 241 nt. The most well-known sequence elements for 3′ UTR-binding proteins in eukaryotes are AU-rich elements (AREs) and the PUF consensus motif.[34,35] We searched for transcripts containing the yeast putative AREs (UAUUUAUU and UUAUUUAU) and PUF consensus motif (UGUANAUA) within the A. oryzae 3′ UTR.[36-38] In the poly(A) data set, 12 and 23 genes possessed AREs and the PUF consensus motif within the 3′ UTR, respectively (Supplementary Table S2). One gene (AO090011000041) particularly exhibited overlapping AUUUA sequences (AUUUAUUUA), a typical ARE motif. In addition, we found orthologs of the yeast ARE-binding protein (Pub1) and four of six yeast Puf family proteins (Puf1, Puf3, Puf4, and Puf6) in the A. oryzae genome (Supplementary Table S3). These results suggested the existence of a regulation system for gene expression that utilizes 3′ UTR-binding proteins in filamentous fungi.

Nucleotide profile of the A. oryzae 3′ UTR

To determine 3′ end processing elements in A. oryzae, we first measured the single nucleotide frequencies for all positions within the 3′ UTR and 100 nt sequence downstream of the poly(A) site (set at position 0). As shown in Fig. 3A, this region was notably U-rich, while AU accounted for 62% of nucleotides in this region (U = 34%; A = 28%). Meanwhile, AU content of the coding region in A. oryzae was 48% (http://www.kazusa.or.jp/codon/), suggesting that a high AU content is characteristic of this region. The 3′ UTR was markedly U-rich, but a A-rich region was observed upstream of the poly(A) site—particularly, the −29 to −14 nt region had a high A content with >30%. In addition, a high U content was also observed in the +1 to +20 nt region immediately downstream of the poly(A) site, but A and U content in the downstream +20 to +100 nt region was almost equal. This AU-rich element (ARE) located in the region immediately downstream of the U-rich region flanking the poly(A) site was also found in yeast and plants, but it has not been defined as the 3′-end-processing element in those organisms.[4,7,8] Moreover, the poly(A) site (position 0) had an extremely high A content (78%), and as described in the Materials and methods section, the first adenine of the poly(A) tail was designated as the poly(A) site nucleotide. High C nucleotide usage was observed at position −1 immediately before the poly(A) site compared with other positions (position −3, 20%; position −2, 21%; position −1, 37%; and position 0, 7%; Fig. 3B). The content of pyrimidine nucleotides (C and U) at position −1 was 68%, suggesting that CA or UA dinucleotides form the optimal cleavage site in A. oryzae, similar to that observed in plants.
Figure 3.

Single nucleotide frequencies in the 3′ UTR and 100 nt sequence downstream of the poly(A) site. (A) Single nucleotide profile in the 3′ UTR and 100 nt sequence downstream of the poly(A) site. The poly(A) site is at position 0. The upstream sequence of the poly(A) site is designated minus and the downstream sequence is designated plus. (B) Sequence logo generated from the actual frequency of occurrence of each of the four nucleotides around the cleavage site. (C) Six regions of the 3′ UTR and 100 nt sequence downstream of the poly(A) site formed according to the single nucleotide profile. The cleavage and polyadenylation site is located between regions IV and V.

Single nucleotide frequencies in the 3′ UTR and 100 nt sequence downstream of the poly(A) site. (A) Single nucleotide profile in the 3′ UTR and 100 nt sequence downstream of the poly(A) site. The poly(A) site is at position 0. The upstream sequence of the poly(A) site is designated minus and the downstream sequence is designated plus. (B) Sequence logo generated from the actual frequency of occurrence of each of the four nucleotides around the cleavage site. (C) Six regions of the 3′ UTR and 100 nt sequence downstream of the poly(A) site formed according to the single nucleotide profile. The cleavage and polyadenylation site is located between regions IV and V. Importantly, the nucleotide distribution profile of the 3′ UTR in A. oryzae was similar to that in plants,[7,8,39] yeast,[4] and mammals,[24] although the U-rich region was expanded towards the coding region of A. oryzae. On the basis of the nucleotide profile observed, the 3′ UTR plus 100 nt sequence downstream of the poly(A) site was divided into six signal element regions, designated regions I–VI, to identify the sequence elements for 3′-end-processing (Fig. 3C).

Search for nucleotide sequence elements for 3′-end-processing

To identify 3′-end-processing elements, we searched for tetramer–heptamer nucleotide sequences that appeared most frequently in each signal element region (Table 1, the top 50 list is available in Supplementary Table S4). In region II, equivalent to the region containing the polyadenylation signal in mammals, no significantly conserved hexanucleotide sequence was observed, similar to that observed in yeast and plants. The top-ranked hexanucleotide was AAUGAA in region II. The top two pentanucleotides (AAUGA and AUGAA) were partial sequences of AAUGAA, and all of the top three heptanucleotides contained the AAUGAA sequence (Table 1). In addition, according to the zeroth- and first-order Markov chain models, calculation of a standard score (Z-score) to measure the standard deviation of the hexanucleotide sequences from its expected occurrence revealed that AAUGAA was the most over-represented hexanucleotide sequence in region II (Table 2). These results suggested that AAUGAA is the most predominant hexanucleotide sequence in region II, although it accounted for only 6% of all transcripts (64 of1043). In contrast, according to the order of Z-scores, the AAUAAA sequence was not the major hexanucleotide sequence in region II, although it ranked third in the list of hexanucleotides. This was also demonstrated by plotting the distribution of hexanucleotide sequences, including AAUGAA and AAUAAA, in the region ranging from −40 to −1 nt (Fig. 4). The AAUGAA sequence was a single nucleotide variant of AAUAAA, but no study has reported that AAUGAA is the most effective A-rich sequence for the 3′-end-processing element in any eukaryote. Interestingly, point mutation of AAUAAA to AAUGAA results in a significant reduction of polyadenylation efficiency by in vitro 3′-end-processing reactions, using nuclear extracts from Xenopus and mammalian cells.[18,40] Thus, the 3′-end-processing machinery in A. oryzae may be somewhat different from that in higher eukaryotes.
Table 1.

The top five sequences (4–7 nt) that most frequently appear in 3′ ends

Region I (from −149 to −30 nt)
4 nt
Numbera
5 nt
Number
6 nt
Number
7 nt
Number
UUUG629UGUUU343UUCUUU172UUUCUUU99
UGUU628UGUAU341UUUCUU162UUUUCUU84
UUGU624UUUCU316UGUUUU152UUCUUUU82
GUUU619UCUUU310UCUUUU149UGUAUAU61
AUUU617UUGUU301UUUUCU144UUUGUUU60
UUGUUU144
Region II (from −29 to −14 nt)
4 nt
Number
5 nt
Number
6 nt
Number
7 nt
Number
AAUA286AAUGA119AAUGAA64AAAUGAA23
AAUG257AUGAA110AUGAAU48AAUGAAA22
AAAU233AAUAU99AAUAAA44AAUGAAU20
AUGA216AAUAA93AAUAUA39AAAUAAA18
UAAU215AUAAU92AAAUGA37AAUAUGA17
AAAUA92AAUAAAU17
Region III (from −13 to −2 nt)
4 nt
Number
5 nt
Number
6 nt
Number
7 nt
Number
UUUU170CUUUU64UCUUUU24UUUUGUU11
AUUU158UUUUC58UUCUUU23UUUCUUU11
UUUC150AUUUU56UUUUCU22UUUUCUU10
CUUU136UUUCU55UUUCUU22UUCUUUU10
UUAU129UUUAU51UUUUGU19UGUUUAU10
UCUUU51
Region IV (from −1 to 0 nt)
2 nt
Number
CA328
UA269
GA235
UG36
UC32
Region V (from +1 to +20 nt)
4 nt
Number
5 nt
Number
6 nt
Number
7 nt
Number
UUCU186UUUUC76UUUUCU36UUUUUCU17
UCUU184UUUCU68UUUUUC32UUUUCUU16
UUUC169CUUUU68UCUUUU31UUUUCUC14
UUUU157UUCUU67UUCUUU27UUUCUUU14
CUUU149UUUUU61CUUUUU27UUUCUCU14
CUUUUUU14
Region VI (from +21 to +100 nt)
4 nt
Number
5 nt
Number
6 nt
Number
7 nt
Number
AUAU465AUGUA180UAUGUA74AUAUGUA32
UGUA453UUGUA177AGAAAA74AUAUAUA30
AAUA426UAUAU177AUAUAU66AAAGAAA30
UAUA423GUAGA177AUUGUA64UAUAUAU28
UAGA412UGUAG166AAAGAA64AAGAAAA28

aThe number of transcripts with at least one occurrence.

Table 2.

Top 10 hexanucleotide sequences mostly over-represented in region II

RankMarkov order = 0
Markov order = 1
WordZ-scoreNumber of occurencesaWordZ-scoreNumber of occurencesa
1AAUGAA16.60367AAUGAA9.85667
2AUGAAU13.14648AUGAAU6.08648
3GAAUGA11.59431GAAUGA6.06731
4UGAAUG10.59425GUCAAU6.00216
5CAAUGC10.08317GUCGCG5.7273
6AAUGCA9.02625CAAUGC5.71117
7UCAAUG8.68421UCGCGU5.594
8AUGCAA8.57624AAUACA5.19629
9AAAUGA7.96238GGCAGU5.0275
10GGAAUG7.86514UCAAUU4.99423
70AAUAAA4.11646

Z-scores of the most over-represented hexanucleotide sequences in region II, according to the zeroth- and first-order Markov chain models.

aThe number of hexanucleotide sequences found in region II.

Figure 4.

Representative hexanucleotide signals in the poly(A) signal region (from −40 to −1 nt).

The top five sequences (4–7 nt) that most frequently appear in 3′ ends aThe number of transcripts with at least one occurrence. Top 10 hexanucleotide sequences mostly over-represented in region II Z-scores of the most over-represented hexanucleotide sequences in region II, according to the zeroth- and first-order Markov chain models. aThe number of hexanucleotide sequences found in region II. Representative hexanucleotide signals in the poly(A) signal region (from −40 to −1 nt). Predominant sequence motifs in the upstream of the A-rich region (region I), called efficiency elements in yeast and far upstream elements in plants, have been identified. The best sequence for yeast efficiency elements is UAUAUA and its single nucleotide variants (UAUGUA and UACAUA).[4,41,42] In contrast, the best sequence of plant far upstream elements is UGUA.[8,38] In addition, this region in mammalian cells is defined as the auxiliary upstream element, and the UGUAN sequence element may function as a recognition element for 3′-end-processing proteins in case of A(A/U)UAAA-lacking 3′ UTRs.[43] However, these sequences were not predominant in region I of the A. oryzae poly(A) data set (Table 1). Moreover, no other sequence motif was highly conserved in this region, although the top nucleotide sequences were notably U-rich sequences. Similarly, no conserved sequence motif was observed in two other U-rich regions (regions III and V), suggesting that these sequences can be defined only as U-rich elements. In region IV, equivalent of the cleavage site, the CA sequence ranked top and this motif existed in 31% of the sequences (Table 1). This suggested that the CA sequence is the most optimal cleavage site in A. oryzae. However, the GA sequence ranked third, and this motif existed in 22% of the sequences, suggesting that CA or UA dinucleotide sequences are not strictly conserved as the cleavage site. In region VI, no commonality was observed in the high-ranked tetramer–heptamer sequences (Table 1), suggesting that this region cannot be defined as a 3′-end-processing element, similar to that in yeast and plants.

Putative 3′-end-processing signals in A. oryzae

Based on the information presented in this study, we proposed putative 3′-end-processing signals in A. oryzae (Fig. 5). The putative 3′-end-processing signals in A. oryzae were similar to those in yeast and plants but some differences were observed between them. First, A-rich sequences upstream of the poly(A) site were less well conserved in all three species than in mammals, and the predominant hexanucleotide in this region of A. oryzae differed from that of yeast and plants. The canonical hexanucleotide AAUAAA signal in mammals is also the most frequently occurring signal in this 3′ UTR of yeast and plants, whereas it is found only in ∼13% and 7–10% of yeast and plant genes, respectively. In contrast, the most over-represented hexanucleotide in A. oryzae was AAUGAA, although this sequence accounted for only 6% of all transcripts, similar to yeast and plant AAUAAA sequences. Second, in the upstream of the A-rich region, while most dominant sequence motifs are well defined in yeast and plants (UAUAUA in yeast and UGUA in plants), no conserved sequence motif was observed in A. oryzae, except for the U-rich elements described earlier.
Figure 5.

A schematic representation of the alignment of 3′-end-processing signals in A. oryzae, yeast, and plants. The arrow indicates the cleavage and polyadenylation site.

A schematic representation of the alignment of 3′-end-processing signals in A. oryzae, yeast, and plants. The arrow indicates the cleavage and polyadenylation site. In a previous study, we showed that a cDNA of the mite Dermatophagoides farinae, known as Der f7, contains the AT-biased codon and therefore is prematurely polyadenylated within the coding region of A. oryzae. We also showed that codon optimization circumvents this premature polyadenylation.[13] The GC content of the native Der f7 open reading frame (ORF) was 37.8%, while that of the codon-optimized Der f7 ORF was 52.8%. Thus, A- and U-rich sequences within the coding region of native Der f7 cDNA were eliminated by codon optimization. The putative 3′-end-processing signals in A. oryzae deduced from this study supported that the A- and U-rich sequences present within the coding region of native Der f7 pre-mRNA were involved in incorrect 3′-end-processing. Although two AAUGAA sequences were present in the coding region of native Der f7 pre-mRNA, neither were located within the region located 10–30 nt upstream of the premature poly(A) sites.[13] This suggested that the AAUGAA sequence within the coding region of the AT-rich heterologous gene could not function by itself as an efficient 3′-end-processing signal in A. oryzae. The A- and U-rich sequences located upstream of the cleavage site might work co-operatively in 3′-end-processing. In future, whether the elimination of the top-ranked A-rich sequences within the coding region of heterologous genes results in the prevention of aberrant, premature transcription termination must be examined empirically.

Protein factors involved in the pre-mRNA 3′-end-processing machinery of A. oryzae

The recognition mechanism of 3′-end-processing signals has been well studied in yeast and mammals, and a large number of protein factors, e.g. ∼14 proteins in mammals and ∼20 proteins in yeast, are required for 3′-end-processing.[2,3] To examine whether these factors involved in 3′-end-processing are conserved in A. oryzae, we searched for homologous proteins of 20 yeast polyadenylation factors in the A. oryzae genome (Table 3). Most homologs of yeast polyadenylation factors, except for 3 factors (Ref2, Syc1, and Pti1), were found in the A. oryzae genome. These three factors are components of the cleavage and polyadenylation factor in yeast,[44] but no homologous proteins of these three factors are observed in plant and mammal genomes, suggesting that they are specific to yeast. In comparison with polyadenylation factors in human genomes, although no homologs of CFIm68 and CFIm59 were found in yeast, A. oryzae, and plant genomes, the homologue of CFIm25 was present in A. oryzae and plant genomes but not in the yeast genome. In contrast, the homologous protein (AO090001000725) of yeast Hrp1, reported to bind to RNA with specificity for the AU-rich efficiency element in yeast,[45,46] was found in the A. oryzae genome, whereas no Hrp1 homologue with higher similarity was found in plant and mammalian genomes. Furthermore, although homologs of human CstF-50 and CPSF73-II were present in the plant genome, these were not found in yeast and A. oryzae genomes. These observations suggested that the protein factors involved in the 3′-end-processing machinery of filamentous fungi resemble, in part, those of yeast and those of plants. This could be indicative of the evolutionary relationship between filamentous fungi, plants, and yeast. Some protein factors homologous to their counterparts in other organisms show differences in their RNA-binding specificity, positioning, and function. For example, while mammalian CPSF160 binds directly to the hexanucleotide AAUAAA signal, the homologue of yeast Yhh1 binds near the A-rich cleavage site but not the A-rich polyadenylation signal.[47] In this regard, because no sequence motif equivalent to the yeast AU-rich efficiency element or the mammalian UGUAN sequence was observed among the putative 3′-end-processing signals in A. oryzae, the RNA-binding specificity and positioning of the homologs Hrp1 and CFIm25 involved in 3′-end-processing in A. oryzae must be investigated.
Table 3.

Comparison of protein factors involved in pre-mRNA 3′-end-processing between Aspergillus oryzae, yeast, plants, and human

Aspergillus oryzaeSaccharomyces cerevisiaeArabidopsis thalianaHomo sapiensBlastP score to yeast homologueBlastP score to plant homologueBlastP score to human homologue
CFIB
AO090001000725Hrp1NoneNone3e−52

CFIAAtCstFCstF
AO090003000655Rna14AT1G17760 (AtCstF77)CstF772e−696e−402e−46
AO090011000789Rna15AT1G71800 (AtCstF64)CstF641e−121e−192e−35
NoneNoneAT5G60940 (AtCstF50)CstF50—–
CFIIm
AO090026000698Clp1AT3G04680 (AtCLPS3)hClp19e−344e−459e−47
AO090012001002Pcf11AT4G04885 (AtPCFS4)hPcf113e−222e−154e−18

CPFAtCPSFCPSF
AO090103000017Yhh1AT5G51660 (AtCPSF160)CPSF1603e−693e−83e−108
AO090005001277Ydh1AT5G23880 (AtCPSF100)CPSF1005e−266e−242e−25
AO090005001001Ysh1AT1G61010 (AtCPSF73-I)CPSF73e−168e−1407e−155
AO090005000813Yth1AT 1G30460 (AtCPSF30)CPSF302e−405e−145e−28
AO080531000089aFip1AT5G58040 (AtFIPS5)hFip14e−104e−062e−11
AO090011000862Pfs2AT5G13480 (AtFY)hPfs2 (WDR33)1e−823e−891e−90
AO090103000067Pta1AT1G27595 (AtSYM2)Symplekin3e−210.0857e−05
AT5G01400 (AtSYM5)
NoneNoneAT2G01730 (AtCPSF73-II)CPSF73L
AO090005001504Ssu72AT1G73820 (Ssu72-like)hSsu724e−481e−383e−41
AO090701000351Glc7AT2G39840 (AtPP1)PP1αe−157e−1411e−154
PP1β1e−151
NoneRef2NoneNone
AO090001000739Mpe1AT5G47430RBBP61e−702e−346e−34
NoneSyc1NoneNone
AO090120000355Swd2AT5G14530WDR822e−525e−403e−54
NonePti1NoneNone
AO090005001182Pap1AT1G17980 (AtPAPS1)PAPe−151e−103e−114
AT2G25850 (AtPAPS2)e−102
AT4G32850 (AtPAPS4)e−101

CFIm
NoneNoneNoneCFIm68
NoneNoneNoneCFIm59
AO090003001316NoneAT4G25550 (AtCFIS2)CFIm251e−605e−60

Protein factors involved in pre-mRNA 3′-end-processing in yeast, plants, and humans are based on the data described in the studies by Mandel et al.,[2] Millevoi and Vagner,[3] and Hunt et al.[48]

aHomologue of A. oryzae Fip1 was retrieved by searching the A. oryzae genome database deposited by the National Research Institute of Brewing, Japan (http://nribf2.nrib.go.jp/genome/blastscope.html).

Comparison of protein factors involved in pre-mRNA 3′-end-processing between Aspergillus oryzae, yeast, plants, and human Protein factors involved in pre-mRNA 3′-end-processing in yeast, plants, and humans are based on the data described in the studies by Mandel et al.,[2] Millevoi and Vagner,[3] and Hunt et al.[48] aHomologue of A. oryzae Fip1 was retrieved by searching the A. oryzae genome database deposited by the National Research Institute of Brewing, Japan (http://nribf2.nrib.go.jp/genome/blastscope.html).

Conclusions

In this study, we identified putative 3′-end-processing signals in A. oryzae using EST and genomic sequencing data. The putative 3′-end-processing signals in A. oryzae identified in this study comprised four elements: the furthest upstream U-rich element; A-rich sequence element (the most dominant sequence being AAUGAA); cleavage site (the most dominant sequence being CA); and U-rich element flanking the cleavage site. Although these putative 3′-end-processing signals in A. oryzae were similar to those found in yeast and plants, obvious differences were observed in the furthest upstream element and A-rich sequence element. To our knowledge, this is the first study of 3′-end-processing signals in filamentous fungi, and we believe that the data presented in this paper will provide knowledge critically important to the understanding of pre-mRNA 3′-end-processing in eukaryotes. In addition, this study also provides useful information on codon optimization of heterologous genes to prevent aberrant, premature polyadenylation within the coding region of filamentous fungi.

Supplementary data

Supplementary data are available at www.dnaresearch.oxfordjournals.org.

Funding

This work was supported, in part, by a Grant-in-Aid for Scientific Research on Priority Areas, Applied Genomics (no. 17019001), from the Ministry of Education, Culture, Sports, Science, and Technology of Japan. M. T. is a JSPS Research Fellow.
  47 in total

1.  In silico detection of control signals: mRNA 3'-end-processing sequences in diverse species.

Authors:  J H Graber; C R Cantor; S C Mohr; T F Smith
Journal:  Proc Natl Acad Sci U S A       Date:  1999-11-23       Impact factor: 11.205

Review 2.  A PUF family portrait: 3'UTR regulation as a way of life.

Authors:  Marvin Wickens; David S Bernstein; Judith Kimble; Roy Parker
Journal:  Trends Genet       Date:  2002-03       Impact factor: 11.639

3.  Structural and functional features of eukaryotic mRNA untranslated regions.

Authors:  G Pesole; F Mignone; C Gissi; G Grillo; F Licciulli; S Liuni
Journal:  Gene       Date:  2001-10-03       Impact factor: 3.688

4.  Regulated ARE-mediated mRNA decay in Saccharomyces cerevisiae.

Authors:  S Vasudevan; S W Peltz
Journal:  Mol Cell       Date:  2001-06       Impact factor: 17.970

Review 5.  Translational control by the 3'-UTR: the ends specify the means.

Authors:  Barsanjit Mazumder; Vasudevan Seshadri; Paul L Fox
Journal:  Trends Biochem Sci       Date:  2003-02       Impact factor: 13.807

Review 6.  The power of the 3' UTR: translational control and development.

Authors:  Scott Kuersten; Elizabeth B Goodwin
Journal:  Nat Rev Genet       Date:  2003-08       Impact factor: 53.242

7.  A defined sequence within the 3' UTR of the areA transcript is sufficient to mediate nitrogen metabolite signalling via accelerated deadenylation.

Authors:  I Y Morozov; M G Martinez; M G Jones; M X Caddick
Journal:  Mol Microbiol       Date:  2000-09       Impact factor: 3.501

8.  Probabilistic prediction of Saccharomyces cerevisiae mRNA 3'-processing sites.

Authors:  Joel H Graber; Gregory D McAllister; Temple F Smith
Journal:  Nucleic Acids Res       Date:  2002-04-15       Impact factor: 16.971

9.  Yhh1p/Cft1p directly links poly(A) site recognition and RNA polymerase II transcription termination.

Authors:  Bernhard Dichtl; Diana Blank; Martin Sadowski; Wolfgang Hübner; Stefan Weiser; Walter Keller
Journal:  EMBO J       Date:  2002-08-01       Impact factor: 11.598

Review 10.  Molecular mechanisms of eukaryotic pre-mRNA 3' end processing regulation.

Authors:  Stefania Millevoi; Stéphan Vagner
Journal:  Nucleic Acids Res       Date:  2009-12-30       Impact factor: 16.971

View more
  5 in total

1.  Phosphorylation of poly(rC) binding protein 1 (PCBP1) contributes to stabilization of mu opioid receptor (MOR) mRNA via interaction with AU-rich element RNA-binding protein 1 (AUF1) and poly A binding protein (PABP).

Authors:  Cheol Kyu Hwang; Yadav Wagley; Ping-Yee Law; Li-Na Wei; Horace H Loh
Journal:  Gene       Date:  2016-11-09       Impact factor: 3.688

2.  Widespread Polycistronic Transcripts in Fungi Revealed by Single-Molecule mRNA Sequencing.

Authors:  Sean P Gordon; Elizabeth Tseng; Asaf Salamov; Jiwei Zhang; Xiandong Meng; Zhiying Zhao; Dongwan Kang; Jason Underwood; Igor V Grigoriev; Melania Figueroa; Jonathan S Schilling; Feng Chen; Zhong Wang
Journal:  PLoS One       Date:  2015-07-15       Impact factor: 3.240

3.  Comparative in silico analysis of EST-SSRs in angiosperm and gymnosperm tree genera.

Authors:  Sonali Sachin Ranade; Yao-Cheng Lin; Andrea Zuccolo; Yves Van de Peer; María del Rosario García-Gil
Journal:  BMC Plant Biol       Date:  2014-08-21       Impact factor: 4.215

4.  Genome-wide analysis of poly(A) site selection in Schizosaccharomyces pombe.

Authors:  Margarita Schlackow; Samuel Marguerat; Nicholas J Proudfoot; Jürg Bähler; Radek Erban; Monika Gullerova
Journal:  RNA       Date:  2013-10-23       Impact factor: 4.942

5.  Analysis of Tc1-Mariner elements in Sclerotinia sclerotiorum suggests recent activity and flexible transposases.

Authors:  Mateus F Santana; José C F Silva; Eduardo S G Mizubuti; Elza F Araújo; Marisa V Queiroz
Journal:  BMC Microbiol       Date:  2014-10-03       Impact factor: 3.605

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.