Literature DB >> 17977886

Graph sharpening plus graph integration: a synergy that improves protein functional classification.

Hyunjung Shin1, Andreas Martin Lisewski, Olivier Lichtarge.   

Abstract

MOTIVATION: Predicting protein function is a central problem in bioinformatics, and many approaches use partially or fully automated methods based on various combination of sequence, structure and other information on proteins or genes. Such information establishes relationships between proteins that can be modelled most naturally as edges in graphs. A priori, however, it is often unclear which edges from which graph may contribute most to accurate predictions. For that reason, one established strategy is to integrate all available sources, or graphs as in graph integration, in the hope that the positive signals will add to each other. However, in the problem of functional prediction, noise, i.e. the presence of inaccurate or false edges, can still be large enough that integration alone has little effect on prediction accuracy. In order to reduce noise levels and to improve integration efficiency, we present here a recent method in graph-based learning, graph sharpening, which provides a theoretically firm yet intuitive and practical approach for disconnecting undesirable edges from protein similarity graphs. This approach has several attractive features: it is quick, scalable in the number of proteins, robust with respect to errors and tolerant of very diverse types of protein similarity measures.
RESULTS: We tested the classification accuracy in a test set of 599 proteins with remote sequence homology spread over 20 Gene Ontology (GO) functional classes. When compared to integration alone, graph sharpening plus integration of four vastly different molecular similarity measures improved the overall classification by nearly 30% [0.17 average increase in the area under the ROC curve (AUC)]. Moreover, and partially through the increased sparsity of the graphs induced by sharpening, this gain in accuracy came at negligible computational cost: sharpening and integration took on average 4.66 (+/-4.44) CPU seconds. AVAILABILITY: Software and Supplementary data will be available on http://mammoth.bcm.tmc.edu/

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17977886     DOI: 10.1093/bioinformatics/btm511

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  20 in total

1.  Incorporating inter-relationships between different levels of genomic data into cancer clinical outcome prediction.

Authors:  Dokyoon Kim; Hyunjung Shin; Kyung-Ah Sohn; Anurag Verma; Marylyn D Ritchie; Ju Han Kim
Journal:  Methods       Date:  2014-02-18       Impact factor: 3.608

Review 2.  Methods of integrating data to uncover genotype-phenotype interactions.

Authors:  Marylyn D Ritchie; Emily R Holzinger; Ruowang Li; Sarah A Pendergrass; Dokyoon Kim
Journal:  Nat Rev Genet       Date:  2015-01-13       Impact factor: 53.242

Review 3.  Protein function prediction: towards integration of similarity metrics.

Authors:  Serkan Erdin; Andreas Martin Lisewski; Olivier Lichtarge
Journal:  Curr Opin Struct Biol       Date:  2011-02-24       Impact factor: 6.809

4.  A Graph-Based Integration of Multimodal Brain Imaging Data for the Detection of Early Mild Cognitive Impairment (E-MCI).

Authors:  Dokyoon Kim; Sungeun Kim; Shannon L Risacher; Li Shen; Marylyn D Ritchie; Michael W Weiner; Andrew J Saykin; Kwangsik Nho
Journal:  Multimodal Brain Image Anal (2013)       Date:  2013

Review 5.  Machine learning: its challenges and opportunities in plant system biology.

Authors:  Mohsen Hesami; Milad Alizadeh; Andrew Maxwell Phineas Jones; Davoud Torkamaneh
Journal:  Appl Microbiol Biotechnol       Date:  2022-05-16       Impact factor: 4.813

6.  A Machine Learning-Based Approach Using Multi-omics Data to Predict Metabolic Pathways.

Authors:  Aakaanksha Kaul; Maryanne Varghese; Vidya Niranjan; Akshay Uttarkar
Journal:  Methods Mol Biol       Date:  2023

7.  Breast cancer survivability prediction using labeled, unlabeled, and pseudo-labeled patient data.

Authors:  Juhyeon Kim; Hyunjung Shin
Journal:  J Am Med Inform Assoc       Date:  2013-03-06       Impact factor: 4.497

8.  Accurate protein structure annotation through competitive diffusion of enzymatic functions over a network of local evolutionary similarities.

Authors:  Eric Venner; Andreas Martin Lisewski; Serkan Erdin; R Matthew Ward; Shivas R Amin; Olivier Lichtarge
Journal:  PLoS One       Date:  2010-12-13       Impact factor: 3.240

9.  A general calculus of fitness landscapes finds genes under selection in cancers.

Authors:  Teng-Kuei Hsu; Jennifer Asmussen; Amanda Koire; Byung-Kwon Choi; Mayur A Gadhikar; Eunna Huh; Chih-Hsu Lin; Daniel M Konecki; Young Won Kim; Curtis R Pickering; Marek Kimmel; Lawrence A Donehower; Mitchell J Frederick; Jeffrey N Myers; Panagiotis Katsonis; Olivier Lichtarge
Journal:  Genome Res       Date:  2022-03-17       Impact factor: 9.438

10.  Function prediction from networks of local evolutionary similarity in protein structure.

Authors:  Serkan Erdin; Eric Venner; Andreas Martin Lisewski; Olivier Lichtarge
Journal:  BMC Bioinformatics       Date:  2013-02-28       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.