Literature DB >> 23046503

A Support Vector Machine based method to distinguish proteobacterial proteins from eukaryotic plant proteins.

Ruchi Verma1, Ulrich Melcher.   

Abstract

BACKGROUND: Members of the phylum Proteobacteria are most prominent among bacteria causing plant diseases that result in a diminution of the quantity and quality of food produced by agriculture. To ameliorate these losses, there is a need to identify infections in early stages. Recent developments in next generation nucleic acid sequencing and mass spectrometry open the door to screening plants by the sequences of their macromolecules. Such an approach requires the ability to recognize the organismal origin of unknown DNA or peptide fragments. There are many ways to approach this problem but none have emerged as the best protocol. Here we attempt a systematic way to determine organismal origins of peptides by using a machine learning algorithm. The algorithm that we implement is a Support Vector Machine (SVM). RESULT: The amino acid compositions of proteobacterial proteins were found to be different from those of plant proteins. We developed an SVM model based on amino acid and dipeptide compositions to distinguish between a proteobacterial protein and a plant protein. The amino acid composition (AAC) based SVM model had an accuracy of 92.44% with 0.85 Matthews correlation coefficient (MCC) while the dipeptide composition (DC) based SVM model had a maximum accuracy of 94.67% and 0.89 MCC. We also developed SVM models based on a hybrid approach (AAC and DC), which gave a maximum accuracy 94.86% and a 0.90 MCC. The models were tested on unseen or untrained datasets to assess their validity.
CONCLUSION: The results indicate that the SVM based on the AAC and DC hybrid approach can be used to distinguish proteobacterial from plant protein sequences.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 23046503      PMCID: PMC3439722          DOI: 10.1186/1471-2105-13-S15-S9

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  27 in total

1.  Support vector machine classification and validation of cancer tissue samples using microarray expression data.

Authors:  T S Furey; N Cristianini; N Duffy; D W Bednarski; M Schummer; D Haussler
Journal:  Bioinformatics       Date:  2000-10       Impact factor: 6.937

Review 2.  Assessing the accuracy of prediction algorithms for classification: an overview.

Authors:  P Baldi; S Brunak; Y Chauvin; C A Andersen; H Nielsen
Journal:  Bioinformatics       Date:  2000-05       Impact factor: 6.937

3.  Support vector machine multiparametric MRI identification of pseudoprogression from tumor recurrence in patients with resected glioblastoma.

Authors:  Xintao Hu; Kelvin K Wong; Geoffrey S Young; Lei Guo; Stephen T Wong
Journal:  J Magn Reson Imaging       Date:  2011-02       Impact factor: 4.813

Review 4.  Plant disease: a threat to global food security.

Authors:  Richard N Strange; Peter R Scott
Journal:  Annu Rev Phytopathol       Date:  2005       Impact factor: 13.078

Review 5.  ROC, LROC, FROC, AFROC: an alphabet soup.

Authors:  Xin He; Eric Frey
Journal:  J Am Coll Radiol       Date:  2009-09       Impact factor: 5.532

6.  Bagging optimal ROC curve method for predictive genetic tests, with an application for rheumatoid arthritis.

Authors:  Qing Lu; Yuehua Cui; Chengyin Ye; Changshuai Wei; Robert C Elston
Journal:  J Biopharm Stat       Date:  2010-03       Impact factor: 1.051

7.  RSLpred: an integrative system for predicting subcellular localization of rice proteins combining compositional and evolutionary information.

Authors:  Rakesh Kaundal; Gajendra P S Raghava
Journal:  Proteomics       Date:  2009-05       Impact factor: 3.984

8.  Identification of conformational B-cell Epitopes in an antigen from its primary sequence.

Authors:  Hifzur Rahman Ansari; Gajendra Ps Raghava
Journal:  Immunome Res       Date:  2010-10-20

9.  Using support vector machines with multiple indices of diffusion for automated classification of mild cognitive impairment.

Authors:  Laurence O'Dwyer; Franck Lamberton; Arun L W Bokde; Michael Ewers; Yetunde O Faluyi; Colby Tanner; Bernard Mazoyer; Desmond O'Neill; Máiréad Bartley; D Rónán Collins; Tara Coughlan; David Prvulovic; Harald Hampel
Journal:  PLoS One       Date:  2012-02-23       Impact factor: 3.240

10.  A novel lineage of proteobacteria involved in formation of marine Fe-oxidizing microbial mat communities.

Authors:  David Emerson; Jeremy A Rentz; Timothy G Lilburn; Richard E Davis; Henry Aldrich; Clara Chan; Craig L Moyer
Journal:  PLoS One       Date:  2007-08-01       Impact factor: 3.240

View more
  3 in total

1.  Characterization of TtALV2, an essential charged repeat motif protein of the Tetrahymena thermophila membrane skeleton.

Authors:  Houda El-Haddad; Jude M Przyborski; Lesleigh G K Kraft; Geoffrey I McFadden; Ross F Waller; Sven B Gould
Journal:  Eukaryot Cell       Date:  2013-04-19

2.  Proceedings of the 2012 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) conference. Introduction.

Authors:  Jonathan D Wren; Mikhail G Dozmorov; Dennis Burian; Rakesh Kaundal; Susan Bridges; Doris M Kupfer
Journal:  BMC Bioinformatics       Date:  2012-09-11       Impact factor: 3.169

3.  Protein sub-nuclear localization prediction using SVM and Pfam domain information.

Authors:  Ravindra Kumar; Sohni Jain; Bandana Kumari; Manish Kumar
Journal:  PLoS One       Date:  2014-06-04       Impact factor: 3.240

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.