| Literature DB >> 21586533 |
Mizuki Tanaka1, Yoshifumi Sakai, Osamu Yamada, Takahiro Shintani, Katsuya Gomi.
Abstract
To investigate 3'-end-processing signals in Aspergillus oryzae, we created a nucleotide sequence data set of the 3'-untranslated region (3' UTR) plus 100 nucleotides (nt) sequence downstream of the poly(A) site using A. oryzae expressed sequence tags and genomic sequencing data. This data set comprised 1065 sequences derived from 1042 unique genes. The average 3' UTR length in A. oryzae was 241 nt, which is greater than that in yeast but similar to that in plants. The 3' UTR and 100 nt sequence downstream of the poly(A) site is notably U-rich, while the region located 15-30 nt upstream of the poly(A) site is markedly A-rich. The most frequently found hexanucleotide in this A-rich region is AAUGAA, although this sequence accounts for only 6% of all transcripts. These data suggested that A. oryzae has no highly conserved sequence element equivalent to AAUAAA, a mammalian polyadenylation signal. We identified that putative 3'-end-processing signals in A. oryzae, while less well conserved than those in mammals, comprised four sequence elements: the furthest upstream U-rich element, A-rich sequence, cleavage site, and downstream U-rich element flanking the cleavage site. Although these putative 3'-end-processing signals are similar to those in yeast and plants, some notable differences exist between them.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21586533 PMCID: PMC3111234 DOI: 10.1093/dnares/dsr011
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Figure 1.Profile of the A. oryzae poly(A) data set. (A) Frequency distribution of EST contigs based on the EST copy number. The EST copy number of each contig contained in the A. oryzae poly(A) data set was obtained from the A. oryzae EST database (http://nribf2.nrib.go.jp/EST2/index.html). Data on the total EST contigs were obtained from the study by Akao et al.[14] (B) Gene expression levels determined by DNA microarray analysis. The fluorescence intensity of each gene was normalized to that of the histone H4 gene.
Figure 2.Distribution of 3′ UTR lengths determined for 1065 unique EST sequences. The average length is 241 nt.
Figure 3.Single nucleotide frequencies in the 3′ UTR and 100 nt sequence downstream of the poly(A) site. (A) Single nucleotide profile in the 3′ UTR and 100 nt sequence downstream of the poly(A) site. The poly(A) site is at position 0. The upstream sequence of the poly(A) site is designated minus and the downstream sequence is designated plus. (B) Sequence logo generated from the actual frequency of occurrence of each of the four nucleotides around the cleavage site. (C) Six regions of the 3′ UTR and 100 nt sequence downstream of the poly(A) site formed according to the single nucleotide profile. The cleavage and polyadenylation site is located between regions IV and V.
The top five sequences (4–7 nt) that most frequently appear in 3′ ends
| Region I (from −149 to −30 nt) | |||||||
| 4 nt | Numbera | 5 nt | Number | 6 nt | Number | 7 nt | Number |
| UUUG | 629 | UGUUU | 343 | UUCUUU | 172 | UUUCUUU | 99 |
| UGUU | 628 | UGUAU | 341 | UUUCUU | 162 | UUUUCUU | 84 |
| UUGU | 624 | UUUCU | 316 | UGUUUU | 152 | UUCUUUU | 82 |
| GUUU | 619 | UCUUU | 310 | UCUUUU | 149 | UGUAUAU | 61 |
| AUUU | 617 | UUGUU | 301 | UUUUCU | 144 | UUUGUUU | 60 |
| UUGUUU | 144 | ||||||
| Region II (from −29 to −14 nt) | |||||||
| 4 nt | Number | 5 nt | Number | 6 nt | Number | 7 nt | Number |
| AAUA | 286 | AAUGA | 119 | AAUGAA | 64 | AAAUGAA | 23 |
| AAUG | 257 | AUGAA | 110 | AUGAAU | 48 | AAUGAAA | 22 |
| AAAU | 233 | AAUAU | 99 | AAUAAA | 44 | AAUGAAU | 20 |
| AUGA | 216 | AAUAA | 93 | AAUAUA | 39 | AAAUAAA | 18 |
| UAAU | 215 | AUAAU | 92 | AAAUGA | 37 | AAUAUGA | 17 |
| AAAUA | 92 | AAUAAAU | 17 | ||||
| Region III (from −13 to −2 nt) | |||||||
| 4 nt | Number | 5 nt | Number | 6 nt | Number | 7 nt | Number |
| UUUU | 170 | CUUUU | 64 | UCUUUU | 24 | UUUUGUU | 11 |
| AUUU | 158 | UUUUC | 58 | UUCUUU | 23 | UUUCUUU | 11 |
| UUUC | 150 | AUUUU | 56 | UUUUCU | 22 | UUUUCUU | 10 |
| CUUU | 136 | UUUCU | 55 | UUUCUU | 22 | UUCUUUU | 10 |
| UUAU | 129 | UUUAU | 51 | UUUUGU | 19 | UGUUUAU | 10 |
| UCUUU | 51 | ||||||
| Region IV (from −1 to 0 nt) | |||||||
| 2 nt | Number | ||||||
| CA | 328 | ||||||
| UA | 269 | ||||||
| GA | 235 | ||||||
| UG | 36 | ||||||
| UC | 32 | ||||||
| Region V (from +1 to +20 nt) | |||||||
| 4 nt | Number | 5 nt | Number | 6 nt | Number | 7 nt | Number |
| UUCU | 186 | UUUUC | 76 | UUUUCU | 36 | UUUUUCU | 17 |
| UCUU | 184 | UUUCU | 68 | UUUUUC | 32 | UUUUCUU | 16 |
| UUUC | 169 | CUUUU | 68 | UCUUUU | 31 | UUUUCUC | 14 |
| UUUU | 157 | UUCUU | 67 | UUCUUU | 27 | UUUCUUU | 14 |
| CUUU | 149 | UUUUU | 61 | CUUUUU | 27 | UUUCUCU | 14 |
| CUUUUUU | 14 | ||||||
| Region VI (from +21 to +100 nt) | |||||||
| 4 nt | Number | 5 nt | Number | 6 nt | Number | 7 nt | Number |
| AUAU | 465 | AUGUA | 180 | UAUGUA | 74 | AUAUGUA | 32 |
| UGUA | 453 | UUGUA | 177 | AGAAAA | 74 | AUAUAUA | 30 |
| AAUA | 426 | UAUAU | 177 | AUAUAU | 66 | AAAGAAA | 30 |
| UAUA | 423 | GUAGA | 177 | AUUGUA | 64 | UAUAUAU | 28 |
| UAGA | 412 | UGUAG | 166 | AAAGAA | 64 | AAGAAAA | 28 |
aThe number of transcripts with at least one occurrence.
Top 10 hexanucleotide sequences mostly over-represented in region II
| Rank | Markov order = 0 | Markov order = 1 | ||||
|---|---|---|---|---|---|---|
| Word | Number of occurencesa | Word | Number of occurencesa | |||
| 1 | AAUGAA | 16.603 | 67 | AAUGAA | 9.856 | 67 |
| 2 | AUGAAU | 13.146 | 48 | AUGAAU | 6.086 | 48 |
| 3 | GAAUGA | 11.594 | 31 | GAAUGA | 6.067 | 31 |
| 4 | UGAAUG | 10.594 | 25 | GUCAAU | 6.002 | 16 |
| 5 | CAAUGC | 10.083 | 17 | GUCGCG | 5.727 | 3 |
| 6 | AAUGCA | 9.026 | 25 | CAAUGC | 5.711 | 17 |
| 7 | UCAAUG | 8.684 | 21 | UCGCGU | 5.59 | 4 |
| 8 | AUGCAA | 8.576 | 24 | AAUACA | 5.196 | 29 |
| 9 | AAAUGA | 7.962 | 38 | GGCAGU | 5.027 | 5 |
| 10 | GGAAUG | 7.865 | 14 | UCAAUU | 4.994 | 23 |
| 70 | AAUAAA | 4.116 | 46 | |||
Z-scores of the most over-represented hexanucleotide sequences in region II, according to the zeroth- and first-order Markov chain models.
aThe number of hexanucleotide sequences found in region II.
Figure 4.Representative hexanucleotide signals in the poly(A) signal region (from −40 to −1 nt).
Figure 5.A schematic representation of the alignment of 3′-end-processing signals in A. oryzae, yeast, and plants. The arrow indicates the cleavage and polyadenylation site.
Comparison of protein factors involved in pre-mRNA 3′-end-processing between Aspergillus oryzae, yeast, plants, and human
| BlastP score to yeast homologue | BlastP score to plant homologue | BlastP score to human homologue | ||||
|---|---|---|---|---|---|---|
| AO090001000725 | Hrp1 | None | None | 3e−52 | — | — |
| AO090003000655 | Rna14 | AT1G17760 (AtCstF77) | CstF77 | 2e−69 | 6e−40 | 2e−46 |
| AO090011000789 | Rna15 | AT1G71800 (AtCstF64) | CstF64 | 1e−12 | 1e−19 | 2e−35 |
| None | None | AT5G60940 (AtCstF50) | CstF50 | — | —– | — |
| AO090026000698 | Clp1 | AT3G04680 (AtCLPS3) | hClp1 | 9e−34 | 4e−45 | 9e−47 |
| AO090012001002 | Pcf11 | AT4G04885 (AtPCFS4) | hPcf11 | 3e−22 | 2e−15 | 4e−18 |
| AO090103000017 | Yhh1 | AT5G51660 (AtCPSF160) | CPSF160 | 3e−69 | 3e−83 | e−108 |
| AO090005001277 | Ydh1 | AT5G23880 (AtCPSF100) | CPSF100 | 5e−26 | 6e−24 | 2e−25 |
| AO090005001001 | Ysh1 | AT1G61010 (AtCPSF73-I) | CPSF73 | e−168 | e−140 | 7e−155 |
| AO090005000813 | Yth1 | AT 1G30460 (AtCPSF30) | CPSF30 | 2e−40 | 5e−14 | 5e−28 |
| AO080531000089a | Fip1 | AT5G58040 (AtFIPS5) | hFip1 | 4e−10 | 4e−06 | 2e−11 |
| AO090011000862 | Pfs2 | AT5G13480 (AtFY) | hPfs2 (WDR33) | 1e−82 | 3e−89 | 1e−90 |
| AO090103000067 | Pta1 | AT1G27595 (AtSYM2) | Symplekin | 3e−21 | 0.085 | 7e−05 |
| AT5G01400 (AtSYM5) | — | |||||
| None | None | AT2G01730 (AtCPSF73-II) | CPSF73L | — | — | — |
| AO090005001504 | Ssu72 | AT1G73820 (Ssu72-like) | hSsu72 | 4e−48 | 1e−38 | 3e−41 |
| AO090701000351 | Glc7 | AT2G39840 (AtPP1) | PP1α | e−157 | e−141 | 1e−154 |
| PP1β | 1e−151 | |||||
| None | Ref2 | None | None | — | — | — |
| AO090001000739 | Mpe1 | AT5G47430 | RBBP6 | 1e−70 | 2e−34 | 6e−34 |
| None | Syc1 | None | None | — | — | — |
| AO090120000355 | Swd2 | AT5G14530 | WDR82 | 2e−52 | 5e−40 | 3e−54 |
| None | Pti1 | None | None | — | — | — |
| AO090005001182 | Pap1 | AT1G17980 (AtPAPS1) | PAP | e−151 | e−103 | e−114 |
| AT2G25850 (AtPAPS2) | e−102 | |||||
| AT4G32850 (AtPAPS4) | e−101 | |||||
| None | None | None | CFIm68 | — | — | — |
| None | None | None | CFIm59 | — | — | — |
| AO090003001316 | None | AT4G25550 (AtCFIS2) | CFIm25 | — | 1e−60 | 5e−60 |
Protein factors involved in pre-mRNA 3′-end-processing in yeast, plants, and humans are based on the data described in the studies by Mandel et al.,[2] Millevoi and Vagner,[3] and Hunt et al.[48]
aHomologue of A. oryzae Fip1 was retrieved by searching the A. oryzae genome database deposited by the National Research Institute of Brewing, Japan (http://nribf2.nrib.go.jp/genome/blastscope.html).