Literature DB >> 22962468

Techniques to cope with missing data in host-pathogen protein interaction prediction.

Meghana Kshirsagar1, Jaime Carbonell, Judith Klein-Seetharaman.   

Abstract

MOTIVATION: Approaches that use supervised machine learning techniques for protein-protein interaction (PPI) prediction typically use features obtained by integrating several sources of data. Often certain attributes of the data are not available, resulting in missing values. In particular, our host-pathogen PPI datasets have a large fraction, in the range of 58-85% of missing values, which makes it challenging to apply machine learning algorithms.
RESULTS: We show that specialized techniques for missing value imputation can improve the performance of the models significantly. We use cross species information in combination with machine learning techniques like Group lasso with ℓ(1)/ℓ(2) regularization. We demonstrate the benefits of our approach on two PPI prediction problems. In our first example of Salmonella-human PPI prediction, we are able to obtain high prediction accuracies with 77.6% precision and 84% recall. Comparison with various other techniques shows an improvement of 9 in F1 score over the next best technique. We also apply our method to Yersinia-human PPI prediction successfully, demonstrating the generality of our approach. AVAILABILITY: Predicted interactions, datasets, features are available at: http://www.cs.cmu.edu/~mkshirsa/eccb2012_paper46.html. CONTACT: judithks@cs.cmu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 22962468      PMCID: PMC3436802          DOI: 10.1093/bioinformatics/bts375

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  21 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  iPfam: visualization of protein-protein interactions in PDB at domain and amino acid resolutions.

Authors:  Robert D Finn; Mhairi Marshall; Alex Bateman
Journal:  Bioinformatics       Date:  2004-09-07       Impact factor: 6.937

3.  Random forest similarity for protein-protein interaction prediction from multiple sources.

Authors:  Yanjun Qi; Judith Klein-Seetharaman; Ziv Bar-Joseph
Journal:  Pac Symp Biocomput       Date:  2005

4.  Struct2net: integrating structure into protein-protein interaction prediction.

Authors:  Rohit Singh; Jinbo Xu; Bonnie Berger
Journal:  Pac Symp Biocomput       Date:  2006

5.  Predicting protein-protein interactions based only on sequences information.

Authors:  Juwen Shen; Jian Zhang; Xiaomin Luo; Weiliang Zhu; Kunqian Yu; Kaixian Chen; Yixue Li; Hualiang Jiang
Journal:  Proc Natl Acad Sci U S A       Date:  2007-03-05       Impact factor: 11.205

Review 6.  The current Salmonella-host interactome.

Authors:  Sylvia Schleker; Jingchun Sun; Balachandran Raghavan; Matthew Srnec; Nicole Müller; Mary Koepfinger; Leelavati Murthy; Zhongming Zhao; Judith Klein-Seetharaman
Journal:  Proteomics Clin Appl       Date:  2011-12-27       Impact factor: 3.494

7.  The Pfam protein families database.

Authors:  Robert D Finn; Jaina Mistry; John Tate; Penny Coggill; Andreas Heger; Joanne E Pollington; O Luke Gavin; Prasad Gunasekaran; Goran Ceric; Kristoffer Forslund; Liisa Holm; Erik L L Sonnhammer; Sean R Eddy; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2009-11-17       Impact factor: 16.971

8.  Prediction of interactions between HIV-1 and human proteins by information integration.

Authors:  Oznur Tastan; Yanjun Qi; Jaime G Carbonell; Judith Klein-Seetharaman
Journal:  Pac Symp Biocomput       Date:  2009

9.  Bias in error estimation when using cross-validation for model selection.

Authors:  Sudhir Varma; Richard Simon
Journal:  BMC Bioinformatics       Date:  2006-02-23       Impact factor: 3.169

10.  A mixture of feature experts approach for protein-protein interaction prediction.

Authors:  Yanjun Qi; Judith Klein-Seetharaman; Ziv Bar-Joseph
Journal:  BMC Bioinformatics       Date:  2007       Impact factor: 3.169

View more
  18 in total

Review 1.  A review on host-pathogen interactions: classification and prediction.

Authors:  R Sen; L Nayak; R K De
Journal:  Eur J Clin Microbiol Infect Dis       Date:  2016-07-29       Impact factor: 3.267

2.  Techniques for transferring host-pathogen protein interactions knowledge to new tasks.

Authors:  Meghana Kshirsagar; Sylvia Schleker; Jaime Carbonell; Judith Klein-Seetharaman
Journal:  Front Microbiol       Date:  2015-02-02       Impact factor: 5.640

Review 3.  Computational approaches for prediction of pathogen-host protein-protein interactions.

Authors:  Esmaeil Nourani; Farshad Khunjush; Saliha Durmuş
Journal:  Front Microbiol       Date:  2015-02-24       Impact factor: 5.640

4.  Comparing human-Salmonella with plant-Salmonella protein-protein interaction predictions.

Authors:  Sylvia Schleker; Meghana Kshirsagar; Judith Klein-Seetharaman
Journal:  Front Microbiol       Date:  2015-01-28       Impact factor: 5.640

5.  AdaBoost based multi-instance transfer learning for predicting proteome-wide interactions between Salmonella and human proteins.

Authors:  Suyu Mei; Hao Zhu
Journal:  PLoS One       Date:  2014-10-17       Impact factor: 3.240

6.  Computational discovery of Epstein-Barr virus targeted human genes and signalling pathways.

Authors:  Suyu Mei; Kun Zhang
Journal:  Sci Rep       Date:  2016-07-29       Impact factor: 4.379

7.  Multitask learning for host-pathogen protein interactions.

Authors:  Meghana Kshirsagar; Jaime Carbonell; Judith Klein-Seetharaman
Journal:  Bioinformatics       Date:  2013-07-01       Impact factor: 6.937

8.  Accurate prediction of nuclear receptors with conjoint triad feature.

Authors:  Hongchu Wang; Xuehai Hu
Journal:  BMC Bioinformatics       Date:  2015-12-03       Impact factor: 3.169

9.  Computational reconstruction of proteome-wide protein interaction networks between HTLV retroviruses and Homo sapiens.

Authors:  Suyu Mei; Hao Zhu
Journal:  BMC Bioinformatics       Date:  2014-07-18       Impact factor: 3.169

10.  HVint: A Strategy for Identifying Novel Protein-Protein Interactions in Herpes Simplex Virus Type 1.

Authors:  Paul Ashford; Anna Hernandez; Todd Michael Greco; Anna Buch; Beate Sodeik; Ileana Mihaela Cristea; Kay Grünewald; Adrian Shepherd; Maya Topf
Journal:  Mol Cell Proteomics       Date:  2016-07-06       Impact factor: 5.911

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.