| Literature DB >> 27604469 |
Yuxiang Jiang1, Tal Ronnen Oron2, Wyatt T Clark3, Asma R Bankapur4, Daniel D'Andrea5, Rosalba Lepore5, Christopher S Funk6, Indika Kahanda7, Karin M Verspoor8,9, Asa Ben-Hur7, Da Chen Emily Koo10, Duncan Penfold-Brown11,12, Dennis Shasha13, Noah Youngs12,13,14, Richard Bonneau13,14,15, Alexandra Lin16, Sayed M E Sahraeian17, Pier Luigi Martelli18, Giuseppe Profiti18, Rita Casadio18, Renzhi Cao19, Zhaolong Zhong19, Jianlin Cheng19, Adrian Altenhoff20,21, Nives Skunca20,21, Christophe Dessimoz22,23,24, Tunca Dogan25, Kai Hakala26,27, Suwisa Kaewphan26,27,28, Farrokh Mehryary26,27, Tapio Salakoski26,28, Filip Ginter26, Hai Fang29, Ben Smithers29, Matt Oates29, Julian Gough29, Petri Törönen30, Patrik Koskinen30, Liisa Holm30,31, Ching-Tai Chen32, Wen-Lian Hsu32, Kevin Bryson22, Domenico Cozzetto22, Federico Minneci22, David T Jones22, Samuel Chapman33, Dukka Bkc33, Ishita K Khan34, Daisuke Kihara34,35, Dan Ofer36, Nadav Rappoport36,37, Amos Stern36,37, Elena Cibrian-Uhalte25, Paul Denny38, Rebecca E Foulger38, Reija Hieta25, Duncan Legge25, Ruth C Lovering38, Michele Magrane25, Anna N Melidoni38, Prudence Mutowo-Meullenet25, Klemens Pichler25, Aleksandra Shypitsyna25, Biao Li2, Pooya Zakeri39,40, Sarah ElShal39,40, Léon-Charles Tranchevent41,42,43, Sayoni Das44, Natalie L Dawson44, David Lee44, Jonathan G Lees44, Ian Sillitoe44, Prajwal Bhat45, Tamás Nepusz46, Alfonso E Romero47, Rajkumar Sasidharan48, Haixuan Yang49, Alberto Paccanaro47, Jesse Gillis50, Adriana E Sedeño-Cortés51, Paul Pavlidis52, Shou Feng1, Juan M Cejuela53, Tatyana Goldberg53, Tobias Hamp53, Lothar Richter53, Asaf Salamov54, Toni Gabaldon55,56,57, Marina Marcet-Houben55,56, Fran Supek56,58,59, Qingtian Gong60,61, Wei Ning60,61, Yuanpeng Zhou60,61, Weidong Tian60,61, Marco Falda62, Paolo Fontana63, Enrico Lavezzo62, Stefano Toppo62, Carlo Ferrari64, Manuel Giollo64,65, Damiano Piovesan64, Silvio C E Tosatto64, Angela Del Pozo66, José M Fernández67, Paolo Maietta68, Alfonso Valencia68, Michael L Tress68, Alfredo Benso69, Stefano Di Carlo69, Gianfranco Politano69, Alessandro Savino69, Hafeez Ur Rehman70, Matteo Re71, Marco Mesiti71, Giorgio Valentini71, Joachim W Bargsten72, Aalt D J van Dijk72,73, Branislava Gemovic74, Sanja Glisic74, Vladmir Perovic74, Veljko Veljkovic74, Nevena Veljkovic74, Danillo C Almeida-E-Silva75, Ricardo Z N Vencio75, Malvika Sharan76, Jörg Vogel76, Lakesh Kansakar77, Shanshan Zhang77, Slobodan Vucetic77, Zheng Wang78, Michael J E Sternberg79, Mark N Wass80, Rachael P Huntley25, Maria J Martin25, Claire O'Donovan25, Peter N Robinson81, Yves Moreau82, Anna Tramontano5, Patricia C Babbitt83, Steven E Brenner17, Michal Linial84, Christine A Orengo44, Burkhard Rost53, Casey S Greene85, Sean D Mooney86, Iddo Friedberg87,88, Predrag Radivojac89.
Abstract
BACKGROUND: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging.Entities:
Keywords: Disease gene prioritization; Protein function prediction
Mesh:
Substances:
Year: 2016 PMID: 27604469 PMCID: PMC5015320 DOI: 10.1186/s13059-016-1037-6
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Time line for the CAFA2 experiment
Fig. 2CAFA2 benchmark breakdown. a The benchmark size for each of the four ontologies. b Breakdown of benchmarks for both types over 11 species (with no less than 15 proteins) sorted according to the total number of benchmark proteins. For both panels, dark colors (blue, red, and yellow) correspond to no-knowledge (NK) types, while their light color counterparts correspond to limited-knowledge (LK) types. The distributions of information contents corresponding to the benchmark sets are shown in Additional file 1. The size of CAFA 1 benchmarks are shown in gray. BPO Biological Process Ontology, CCO Cellular Component Ontology, HPO Human Phenotype Ontology, LK limited-knowledge, MFO Molecular Function Ontology, NK no-knowledge
Fig. 3CAFA1 versus CAFA2 (top methods). A comparison in F max between the top-five CAFA1 models against the top-five CAFA2 models. Colored boxes encode the results such that (1) the colors indicate margins of a CAFA2 method over a CAFA1 method in F max and (2) the numbers in the box indicate the percentage of wins. For both the Molecular Function Ontology (a) and Biological Process Ontology (b) results: A CAFA1 top-five models (rows, from top to bottom) against CAFA2 top-five models (columns, from left to right). B Comparison of Naïve baselines trained respectively on SwissProt2011 and SwissProt2014. C Comparison of BLAST baselines trained on SwissProt2011 and SwissProt2014
Fig. 4Overall evaluation using the maximum F measure, F max. Evaluation was carried out on no-knowledge benchmark sequences in the full mode. The coverage of each method is shown within its performance bar. A perfect predictor would be characterized with F max=1. Confidence intervals (95 %) were determined using bootstrapping with 10,000 iterations on the set of benchmark sequences. For cases in which a principal investigator participated in multiple teams, the results of only the best-scoring method are presented. Details for all methods are provided in Additional file 1
Fig. 5Precision–recall curves for top-performing methods. Evaluation was carried out on no-knowledge benchmark sequences in the full mode. A perfect predictor would be characterized with F max=1, which corresponds to the point (1,1) in the precision–recall plane. For cases in which a principal investigator participated in multiple teams, the results of only the best-scoring method are presented
Fig. 6Overall evaluation using the minimum semantic distance, S min. Evaluation was carried out on no-knowledge benchmark sequences in the full mode. The coverage of each method is shown within its performance bar. A perfect predictor would be characterized with S min=0. Confidence intervals (95 %) were determined using bootstrapping with 10,000 iterations on the set of benchmark sequences. For cases in which a principal investigator participated in multiple teams, the results of only the best-scoring method are presented. Details for all methods are provided in Additional file 1
Fig. 7Overall evaluation using the averaged AUC over terms with no less than ten positive annotations. The evaluation was carried out on no-knowledge benchmark sequences in the full mode. Error bars indicate the standard error in averaging AUC over terms for each method. For cases in which a principal investigator participated in multiple teams, the results of only the best-scoring method are presented. Details for all methods are provided in Additional file 1. AUC receiver operating characteristic curve
Fig. 8Averaged AUC per term for Human Phenotype Ontology. a Terms are sorted based on AUC. The dashed red line indicates the performance of the Naïve method. b The top-ten accurately predicted terms without overlapping ancestors (except for the root). AUC receiver operating characteristic curve
Fig. 9Performance evaluation using the maximum F measure, F max, on eukaryotic (left) versus prokaryotic (right) benchmark sequences. The evaluation was carried out on no-knowledge benchmark sequences in the full mode. The coverage of each method is shown within its performance bar. Confidence intervals (95 %) were determined using bootstrapping with 10,000 iterations on the set of benchmark sequences. For cases in which a principal investigator participated in multiple teams, the results of only the best-scoring method are presented. Details for all methods are provided in Additional file 1
Fig. 10Similarity network of participating methods for BPO. Similarities are computed as Pearson’s correlation coefficient between methods, with a 0.75 cutoff for illustration purposes. A unique color is assigned to all methods submitted under the same principal investigator. Not evaluated (organizers’) methods are shown in triangles, while benchmark methods (Naïve and BLAST) are shown in squares. The top-ten methods are highlighted with enlarged nodes and circled in red. The edge width indicates the strength of similarity. Nodes are labeled with the name of the methods followed by “-team(model)” if multiple teams/models were submitted
Fig. 11Case study on the human ADAM-TS12 gene. Biological process terms associated with ADAM-TS12 gene in the union of the three databases by September 2014. The entire functional annotation of ADAM-TS12 consists of 89 terms, 28 of which are shown. Twelve terms, marked in green, are leaf terms. This directed acyclic graph was treated as ground truth in the CAFA2 assessment. Solid black lines provide direct “is a” or “part of” relationships between terms, while gray lines mark indirect relationships (that is, some terms were not drawn in this picture). Predicted terms of the top-five methods and two baseline methods were picked at their optimal F max threshold. Over-predicted terms are not shown