| Literature DB >> 16845023 |
Chisato Yamasaki1, Hiroaki Kawashima, Fusano Todokoro, Yasuhiro Imamizu, Makoto Ogawa, Motohiko Tanino, Takeshi Itoh, Takashi Gojobori, Tadashi Imanishi.
Abstract
Transcriptome Auto-annotation Conducting Tool (TACT) is a newly developed web-based automated tool for conducting functional annotation of transcripts by the integration of sequence similarity searches and functional motif predictions. We developed the TACT system by integrating two kinds of similarity searches, FASTY and BLASTX, against protein sequence databases, UniProtKB (Swiss-Prot/TrEMBL) and RefSeq, and a unified motif prediction program, InterProScan, into the ORF-prediction pipeline originally designed for the 'H-Invitational' human transcriptome annotation project. This system successively applies these constituent programs to an mRNA sequence in order to predict the most plausible ORF and the function of the protein encoded. In this study, we applied the TACT system to 19 574 non-redundant human transcripts registered in H-InvDB and evaluated its predictive power by the degree of agreement with human-curated functional annotation in H-InvDB. As a result, the TACT system could assign functional description to 12 559 transcripts (64.2%), the remainder being hypothetical proteins. Furthermore, the overall agreement of functional annotation with H-InvDB, including those transcripts annotated as hypothetical proteins, was 83.9% (16 432/19 574). These results show that the TACT system is useful for functional annotation and that the prediction of ORFs and protein functions is highly accurate and close to the results of human curation. TACT is freely available at http://www.jbirc.aist.go.jp/tact/.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16845023 PMCID: PMC1538819 DOI: 10.1093/nar/gkl283
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1The TACT annotation pipeline. The flowchart illustrates the TACT computational analysis and web server interfaces. The white arrows indicate the input sequence data to TACT and output annotation data from TACT to users. The thick solid arrows indicate the data flow within the TACT server during analysis.
The degree of agreement of TACT annotation and human curation
| Similarity category | Description | No. of H-Inv proteins examined | No. of correctly predicted proteins by TACT | The agreement of TACT annotation and human curation (%) |
|---|---|---|---|---|
| I | Identical to a known human protein. (Identity ≥98% and coverage =100%) | 5313 | 4735 | 89.1 |
| II | Similar to a known protein of any species. (Identity ≥50%) | 5859 | 3469 | 59.2 |
| III | InterPro domain-containing protein | 1387 | 1320 | 95.2 |
| IV | Conserved hypothetical protein | 1309 | 1265 | 98.9 |
| V | Hypothetical protein | 5706 | 5643 | 96.6 |
| Total | 19 574 | 16 432 | 83.9 |
Figure 2TACT web-based interfaces. Sample views of TACT top (A), data submission (B) and annotation view (C) for HIT000017619 (AK092752) are shown. The annotation view (C) shows detailed annotation information and has links to external databases as indicated. The blue arrows indicate the flows of views during the TACT analysis and black lines indicate the links to appropriate reference data in H-InvDB or external public databases.