| Literature DB >> 17540709 |
Takeshi Akao1, Motoaki Sano, Osamu Yamada, Terumi Akeno, Kaoru Fujii, Kuniyasu Goto, Sumiko Ohashi-Kunihiro, Kumiko Takase, Makoto Yasukawa-Watanabe, Kanako Yamaguchi, Yoko Kurihara, Jun-ichi Maruyama, Praveen Rao Juvvadi, Akimitsu Tanaka, Yoji Hata, Yasuji Koyama, Shotaro Yamaguchi, Noriyuki Kitamoto, Katsuya Gomi, Keietsu Abe, Michio Takeuchi, Tetsuo Kobayashi, Hiroyuki Horiuchi, Katsuhiko Kitamoto, Yutaka Kashiwagi, Masayuki Machida, Osamu Akita.
Abstract
We performed random sequencing of cDNAs from nine biologically or industrially important cultures of the industrially valuable fungus Aspergillus oryzae to obtain expressed sequence tags (ESTs). Consequently, 21 446 raw ESTs were accumulated and subsequently assembled to 7589 non-redundant consensus sequences (contigs). Among all contigs, 5491 (72.4%) were derived from only a particular culture. These included 4735 (62.4%) singletons, i.e. lone ESTs overlapping with no others. These data showed that consideration of culture grown under various conditions as cDNA sources enabled efficient collection of ESTs. BLAST searches against the public databases showed that 2953 (38.9%) of the EST contigs showed significant similarities to deposited sequences with known functions, 793 (10.5%) were similar to hypothetical proteins, and the remaining 3843 (50.6%) showed no significant similarity to sequences in the databases. Culture-specific contigs were extracted on the basis of the EST frequency normalized by the total number for each culture condition. In addition, contig sequences were compared with sequence sets in eukaryotic orthologous groups (KOGs), and classified into the KOG functional categories.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17540709 PMCID: PMC2779895 DOI: 10.1093/dnares/dsm008
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Figure 1Sequence length distribution of raw ESTs and contigs. Hatched bar, raw EST (before assembly); Solid bar, contigs (after assembly).
Brief summary of the features of the A. oryzae ESTs from various culture conditions
| Source cDNA libraries (cultures) | Number of raw ESTs | Number of contigs | Average frequency | Singletons | Unique contigsa | Contigs with no similarity to databaseb | Research organizations in charge | |||
|---|---|---|---|---|---|---|---|---|---|---|
| Number | Ratio (%)c | Number | Ratio (%)c | Number | Ratio (%)c | |||||
| Total ESTs | 21446 | 7589 | 2.83 | 4735 | 62.4 | 5491d | 72.4 | 3843 | 50.6 | |
| ESTs from LCs (including plate culture: PA) | 9262 | 4181 | 2.22 | 2233 | 53.4 | 2742e | 65.6 | 2022 | 48.4 | |
| LR library (nutrient-rich culture) | 2611 | 1518 | 1.72 | 616 | 40.6 | 656 | 43.2 | 561 | 37.0 | AIST, UT, NU |
| LH library (nutrient-rich culture at higher temperature) | 2049 | 1086 | 1.89 | 371 | 34.2 | 427 | 39.3 | 442 | 40.7 | NFRI |
| LM library (maltose-inductive culture) | 926 | 653 | 1.42 | 247 | 37.8 | 278 | 42.6 | 262 | 40.1 | NU |
| LS library (carbon-starved culture) | 1940 | 1217 | 1.59 | 422 | 34.7 | 471 | 38.7 | 434 | 35.7 | AIST |
| LG library (germinated conidia and conidia) | 1000 | 701 | 1.43 | 376 | 53.6 | 389 | 55.5 | 428 | 61.1 | TUAT |
| PA library (alkaline pH agar plate culture) | 736 | 519 | 1.42 | 201 | 38.7 | 225 | 43.4 | 233 | 44.9 | UT |
| ESTs from solid-state cultures | 12184 | 4847 | 2.51 | 2502 | 51.6 | 3408e | 70.3 | 2236 | 46.1 | |
| SW library (wheat bran culture) | 7725 | 3707 | 2.08 | 1731 | 46.7 | 2177 | 58.7 | 1637 | 44.2 | NRIB, THU |
| SS library (soybean culture) | 991 | 486 | 2.04 | 184 | 37.9 | 204 | 42.0 | 194 | 39.9 | AIST |
| SR library (rice culture) | 3468 | 1701 | 2.04 | 587 | 34.5 | 664 | 39.0 | 654 | 38.4 | NRIB |
AIST, National Institute of Advanced Industrial Science and Technology; NFRI, National Food Research Institute; NRIB, National Research Institute of Brewing; NU, Nagoya University; THU, Tohoku University; TUAT, Tokyo University of Agriculture and Technology; UT, University of Tokyo.
aContigs composed of ESTs from only LC, SC, or a particular library (including singletons).
bContigs of which the E-values from the BLAST search against the most similar amino acid sequence were not less than 1E-9.
cRatio to ‘Number of contigs’ in each line.
dSum of unique contigs from each library. The contigs unique to LC or SC were not included.
eContigs obtained only from LC or SC. Redundancies among the libraries were not removed.
Figure 2Frequency distribution of the contigs. Contigs were generated by assembling raw ESTs, and they no longer overlapped with each other. The general tendency of frequency, i.e. redundancy of raw ESTs within a contig was analyzed. (A) Total EST contigs. (B) EST contigs of known sequence. (C) EST contigs with no significant similarity.
Results of similarity search against the public non-redundant protein database
| Similarity | Number of contigs | (%) |
|---|---|---|
| Function-predicted genes | 2953 | (38.9) |
| Hypothetical proteina | 793 | (10.5) |
| No significant similarity | 3843 | (50.6) |
| Total contigs | 7589 | (100.0) |
aSimilarity to deduced amino acid sequences with no definitive functions.
Figure 3Dendrogram of culture conditions used for collecting ESTs. Culture conditions were classified based on the normalized frequency values of contigs by hierarchical clustering using the nearest neighbor-joining method.
Figure 4Functional classification of contigs to major KOG categories. Contigs were compared with the KOG sequence set using BLASTX, and then classified into the major KOG major categories of most similar sequence (E-value < 10E-9). Categorical redundancy of contigs was not removed when they belonged to multiple KOG categories. (A) Total contigs, contigs from LCs, and contigs from SCs (B) contigs from each library.