Literature DB >> 20186553

Identification of functionally diverse lipocalin proteins from sequence information using support vector machine.

Ganesan Pugalenthi1, Krishna Kumar Kandaswamy, P N Suganthan, G Archunan, R Sowdhamini.   

Abstract

Lipocalins are functionally diverse proteins that are composed of 120-180 amino acid residues. Members of this family have several important biological functions including ligand transport, cryptic coloration, sensory transduction, endonuclease activity, stress response activity in plants, odorant binding, prostaglandin biosynthesis, cellular homeostasis regulation, immunity, immunotherapy and so on. Identification of lipocalins from protein sequence is more challenging due to the poor sequence identity which often falls below the twilight zone. So far, no specific method has been reported to identify lipocalins from primary sequence. In this paper, we report a support vector machine (SVM) approach to predict lipocalins from protein sequence using sequence-derived properties. LipoPred was trained using a dataset consisting of 325 lipocalin proteins and 325 non-lipocalin proteins, and evaluated by an independent set of 140 lipocalin proteins and 21,447 non-lipocalin proteins. LipoPred achieved 88.61% accuracy with 89.26% sensitivity, 85.27% specificity and 0.74 Matthew's correlation coefficient (MCC). When applied on the test dataset, LipoPred achieved 84.25% accuracy with 88.57% sensitivity, 84.22% specificity and MCC of 0.16. LipoPred achieved better performance rate when compared with PSI-BLAST, HMM and SVM-Prot methods. Out of 218 lipocalins, LipoPred correctly predicted 194 proteins including 39 lipocalins that are non-homologous to any protein in the SWISSPROT database. This result shows that LipoPred is potentially useful for predicting the lipocalin proteins that have no sequence homologs in the sequence databases. Further, successful prediction of nine hypothetical lipocalin proteins and five new members of lipocalin family prove that LipoPred can be efficiently used to identify and annotate the new lipocalin proteins from sequence databases. The LipoPred software and dataset are available at http://www3.ntu.edu.sg/home/EPNSugan/index_files/lipopred.htm.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 20186553     DOI: 10.1007/s00726-010-0520-8

Source DB:  PubMed          Journal:  Amino Acids        ISSN: 0939-4451            Impact factor:   3.520


  3 in total

1.  Fuzzy clustering of physicochemical and biochemical properties of amino acids.

Authors:  Indrajit Saha; Ujjwal Maulik; Sanghamitra Bandyopadhyay; Dariusz Plewczynski
Journal:  Amino Acids       Date:  2011-10-13       Impact factor: 3.520

2.  Probing an optimal class distribution for enhancing prediction and feature characterization of plant virus-encoded RNA-silencing suppressors.

Authors:  Abhigyan Nath; Karthikeyan Subbiah
Journal:  3 Biotech       Date:  2016-03-21       Impact factor: 2.406

3.  DOR - a Database of Olfactory Receptors - Integrated Repository for Sequence and Secondary Structural Information of Olfactory Receptors in Selected Eukaryotic Genomes.

Authors:  Balasubramanian Nagarathnam; Snehal D Karpe; Krishnan Harini; Kannan Sankar; Mohammed Iftekhar; Durairaj Rajesh; Sadasivam Giji; Govidaraju Archunan; Veluchamy Balakrishnan; M Michael Gromiha; Wataru Nemoto; Kazhuhiko Fukui; Ramanathan Sowdhamini
Journal:  Bioinform Biol Insights       Date:  2014-06-12
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.