Patrik Koskinen1, Petri Törönen1, Jussi Nokso-Koivisto1, Liisa Holm2. 1. Department of Biosciences, University of Helsinki, 00014 Helsinki, Finland and Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland. 2. Department of Biosciences, University of Helsinki, 00014 Helsinki, Finland and Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland Department of Biosciences, University of Helsinki, 00014 Helsinki, Finland and Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland.
Abstract
MOTIVATION: The last decade has seen a remarkable growth in protein databases. This growth comes at a price: a growing number of submitted protein sequences lack functional annotation. Approximately 32% of sequences submitted to the most comprehensive protein database UniProtKB are labelled as 'Unknown protein' or alike. Also the functionally annotated parts are reported to contain 30-40% of errors. Here, we introduce a high-throughput tool for more reliable functional annotation called Protein ANNotation with Z-score (PANNZER). PANNZER predicts Gene Ontology (GO) classes and free text descriptions about protein functionality. PANNZER uses weighted k-nearest neighbour methods with statistical testing to maximize the reliability of a functional annotation. RESULTS: Our results in free text description line prediction show that we outperformed all competing methods with a clear margin. In GO prediction we show clear improvement to our older method that performed well in CAFA 2011 challenge.
MOTIVATION: The last decade has seen a remarkable growth in protein databases. This growth comes at a price: a growing number of submitted protein sequences lack functional annotation. Approximately 32% of sequences submitted to the most comprehensive protein database UniProtKB are labelled as 'Unknown protein' or alike. Also the functionally annotated parts are reported to contain 30-40% of errors. Here, we introduce a high-throughput tool for more reliable functional annotation called Protein ANNotation with Z-score (PANNZER). PANNZER predicts Gene Ontology (GO) classes and free text descriptions about protein functionality. PANNZER uses weighted k-nearest neighbour methods with statistical testing to maximize the reliability of a functional annotation. RESULTS: Our results in free text description line prediction show that we outperformed all competing methods with a clear margin. In GO prediction we show clear improvement to our older method that performed well in CAFA 2011 challenge.
Authors: Margarita Andreevskaya; Per Johansson; Pia Laine; Olli-Pekka Smolander; Matti Sonck; Riitta Rahkila; Elina Jääskeläinen; Lars Paulin; Petri Auvinen; Johanna Björkroth Journal: Appl Environ Microbiol Date: 2015-03-27 Impact factor: 4.792
Authors: Ana Cao; María de la Fuente; Noemi Gesteiro; Rogelio Santiago; Rosa Ana Malvar; Ana Butrón Journal: Front Plant Sci Date: 2022-05-02 Impact factor: 6.627
Authors: Young-Jun Choi; Santiago Fontenla; Peter U Fischer; Thanh Hoa Le; Alicia Costábile; David Blair; Paul J Brindley; Jose F Tort; Miguel M Cabada; Makedonka Mitreva Journal: Mol Biol Evol Date: 2020-01-01 Impact factor: 16.240
Authors: Patrik Koskinen; Paulina Deptula; Olli-Pekka Smolander; Fitsum Tamene; Juhana Kammonen; Kirsi Savijoki; Lars Paulin; Vieno Piironen; Petri Auvinen; Pekka Varmanen Journal: Stand Genomic Sci Date: 2015-10-24