| Literature DB >> 23514353 |
Meghana Chitale1, Ishita K Khan, Daisuke Kihara.
Abstract
BACKGROUND: Many Automatic Function Prediction (AFP) methods were developed to cope with an increasing growth of the number of gene sequences that are available from high throughput sequencing experiments. To support the development of AFP methods, it is essential to have community wide experiments for evaluating performance of existing AFP methods. Critical Assessment of Function Annotation (CAFA) is one such community experiment. The meeting of CAFA was held as a Special Interest Group (SIG) meeting at the Intelligent Systems in Molecular Biology (ISMB) conference in 2011. Here, we perform a detailed analysis of two sequence-based function prediction methods, PFP and ESG, which were developed in our lab, using the predictions submitted to CAFA.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23514353 PMCID: PMC3584938 DOI: 10.1186/1471-2105-14-S3-S2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Performance of PFP (confidence score), PFP prediction sorted by the raw score (PFP_RAW), ESG, PRIOR, BLAST, and GOtcha. A, Precision - Recall plot for the BP domain. B, ROC for the BP domain. C, Precision - Recall plot for the MF domain. D, ROC for the MF domain.
Figure 2Performance of PFP and ESG with enriched priors (PFP/ESG+PRIOR), PFP, ESG, Prior, BLAST and GOtcha. The top N method was used for this evaluation. A, Precision - Recall plot for the BP domain; B, ROC for the BP domain; C, Precision - Recall plot for the MF domain; D, ROC for the MF domain.
Figure 3Performance of PFP and ESG as compared with Prior, BLAST, and GOtcha using semantic similarity method. A, Semantic similarity relative to the score threshold. Predictions in the BP domain are evaluated; B, semantic precision vs semantic recall for the BP domain; C, Semantic similarity relative to the score threshold in the MF domain; D, semantic precision vs semantic recall for the MF domain.
Figure 4Prediction accuracy evaluated for each functional category. Each row represents a GO term category and each column represents a prediction method. Count refers to the number of target proteins that were annotated by the given a GO term in the category. The F1 measure was used for evaluation. The color ranges from white (minimum) to red (maximum score). A, the BP domain. Results of a sample of 20 terms are shown, which are taken out of the 77 BP terms annotating 25 or more targets. B, the MF domain. Results for 11 MF terms each annotating 25 or more targets are shown.
CAFA target prediction examples for PFP, ESG, and BLAST
| CAFA Target | GO Term | Definition | Score | |
|---|---|---|---|---|
| T06450, | GO:0008152 | metabolic process | ||
| GO:0044260 | cellular protein metabolic process | 0.99 | ||
| GO:0000746 | conjugation | 0.61 | ||
| T06299, | GO:0019740 | nitrogen utilization | ||
| GO:0006139 | nucleobase, nucleoside, nucleotide and nucleic acid metabolism | 1 | ||
| GO:0055114 | oxidation-reduction process | 1 | ||
| T05345, | GO:0008152 | metabolic process | ||
| GO:0007165 | signal transduction | 1 | ||
| GO:0006464 | protein modification | 0.99 | ||
| GO:0000160 | two-component signal transduction system (phosphorelay) | 1 | ||
| GO:0000160 | two-component signal transduction system (phosphorelay) | 0.39 | ||
| T18799, | GO:0006139 | nucleobase, nucleoside, nucleotide and nucleic acid metabolic process | ||
| GO:0050789 | regulation of biological process | 1 | ||
| GO:0006412 | translation | 0.02 | ||
| GO:0006139 | nucleobase, nucleoside, nucleotide and nucleic acid metabolic process | 0.4 | ||
This table shows partial list of GO annotations and their ancestors for four targets that have been predicted using PFP, ESG, and BLAST. The first column lists the target IDs along with the maximum F1 score among predictions at all cutoffs using the predicted terms and their ancestors from each method. For each method we list the predicted GO annotations and partial list of their ancestors along with their scores.