Literature DB >> 21774797

Analysis and prediction of cancerlectins using evolutionary and domain information.

Ravi Kumar1, Bharat Panwar, Jagat S Chauhan, Gajendra Ps Raghava.   

Abstract

BACKGROUND: Predicting the function of a protein is one of the major challenges in the post-genomic era where a large number of protein sequences of unknown function are accumulating rapidly. Lectins are the proteins that specifically recognize and bind to carbohydrate moieties present on either proteins or lipids. Cancerlectins are those lectins that play various important roles in tumor cell differentiation and metastasis. Although the two types of proteins are linked, still there is no computational method available that can distinguish cancerlectins from the large pool of non-cancerlectins. Hence, it is imperative to develop a method that can distinguish between cancer and non-cancerlectins.
RESULTS: All the models developed in this study are based on a non-redundant dataset containing 178 cancerlectins and 226 non-cancerlectins in which no two sequences have more than 50% sequence similarity. We have applied the similarity search based technique, i.e. BLAST, and achieved a maximum accuracy of 43.25%. The amino acids compositional analysis have shown that certain residues (e.g. Leucine, Proline) were preferred in cancerlectins whereas some other (e.g. Asparatic acid, Asparagine) were preferred in non-cancerlectins. It has been found that the PROSITE domain "Crystalline beta gamma" was abundant in cancerlectins whereas domains like "SUEL-type lectin domain" were found mainly in non-cancerlectins. An SVM-based model has been developed to differentiate between the cancer and non-cancerlectins which achieved a maximum Matthew's correlation coefficient (MCC) value of 0.32 with an accuracy of 64.84%, using amino acid compositions. We have developed a model based on dipeptide compositions which achieved an MCC value of 0.30 with an accuracy of 64.84%. Thereafter, we have developed models based on split compositions (2 and 4 parts) and achieved an MCC value of 0.31, 0.32 with accuracies of 65.10% and 66.09%, respectively. An SVM model based on Position Specific Scoring Matrix (PSSM), generated by PSI-BLAST, was developed and achieved an MCC value of 0.36 with an accuracy of 68.34%. Finally, we have integrated the PROSITE domain information with PSSM and developed an SVM model that has achieved an MCC value of 0.38 with 69.09% accuracy.
CONCLUSION: BLAST has been found inefficient to distinguish between cancer and non-cancerlectins. We analyzed the protein sequences of cancer and non-cancerlectins and identified interesting patterns. We have been able to identify PROSITE domains that are preferred in cancer and non-cancerlectins and thus provided interesting insights into the two types of proteins. The method developed in this study will be useful for researchers studying cancerlectins, lectins and cancer biology. The web-server based on the above study, is available at http://www.imtech.res.in/raghava/cancer_pred/

Entities:  

Year:  2011        PMID: 21774797      PMCID: PMC3161874          DOI: 10.1186/1756-0500-4-237

Source DB:  PubMed          Journal:  BMC Res Notes        ISSN: 1756-0500


  47 in total

1.  ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST.

Authors:  Manoj Bhasin; G P S Raghava
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

Review 2.  Lectins as bioactive plant proteins: a potential in cancer treatment.

Authors:  Elvira González De Mejía; Valentin I Prisecaru
Journal:  Crit Rev Food Sci Nutr       Date:  2005       Impact factor: 11.176

3.  Improved tools for biological sequence comparison.

Authors:  W R Pearson; D J Lipman
Journal:  Proc Natl Acad Sci U S A       Date:  1988-04       Impact factor: 11.205

4.  Prediction of cell wall lytic enzymes using Chou's amphiphilic pseudo amino acid composition.

Authors:  Hui Ding; Liaofu Luo; Hao Lin
Journal:  Protein Pept Lett       Date:  2009       Impact factor: 1.890

5.  Overexpression of annexin 1 in pancreatic cancer and its clinical significance.

Authors:  Xiao-Feng Bai; Xiao-Guang Ni; Ping Zhao; Shang-Mei Liu; Hui-Xin Wang; Bing Guo; Lan-Ping Zhou; Fang Liu; Jin-Sheng Zhang; Kun Wang; Yong-Qiang Xie; Yong-Fu Shao; Xiao-Hang Zhao
Journal:  World J Gastroenterol       Date:  2004-05-15       Impact factor: 5.742

6.  RSLpred: an integrative system for predicting subcellular localization of rice proteins combining compositional and evolutionary information.

Authors:  Rakesh Kaundal; Gajendra P S Raghava
Journal:  Proteomics       Date:  2009-05       Impact factor: 3.984

7.  Helix pomatia agglutinin binding is a useful prognostic indicator in colorectal carcinoma.

Authors:  U Schumacher; D Higgs; M Loizidou; R Pickering; A Leathem; I Taylor
Journal:  Cancer       Date:  1994-12-15       Impact factor: 6.860

8.  Lectin-binding properties of human breast cancer cell lines and human milk with particular reference to Helix pomatia agglutinin.

Authors:  U Schumacher; E Adam; S A Brooks; A J Leathem
Journal:  J Histochem Cytochem       Date:  1995-03       Impact factor: 2.479

9.  Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs.

Authors:  Mamoon Rashid; Sudipto Saha; Gajendra Ps Raghava
Journal:  BMC Bioinformatics       Date:  2007-09-13       Impact factor: 3.169

10.  Identification of DNA-binding proteins using support vector machines and evolutionary profiles.

Authors:  Manish Kumar; Michael M Gromiha; Gajendra P S Raghava
Journal:  BMC Bioinformatics       Date:  2007-11-27       Impact factor: 3.169

View more
  10 in total

1.  Sequence-based predictive modeling to identify cancerlectins.

Authors:  Hong-Yan Lai; Xin-Xin Chen; Wei Chen; Hua Tang; Hao Lin
Journal:  Oncotarget       Date:  2017-04-25

2.  Prediction of uridine modifications in tRNA sequences.

Authors:  Bharat Panwar; Gajendra P S Raghava
Journal:  BMC Bioinformatics       Date:  2014-10-02       Impact factor: 3.169

3.  Accurate Identification of Cancerlectins through Hybrid Machine Learning Technology.

Authors:  Jieru Zhang; Ying Ju; Huijuan Lu; Ping Xuan; Quan Zou
Journal:  Int J Genomics       Date:  2016-07-13       Impact factor: 2.326

4.  A Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique.

Authors:  Runtao Yang; Chengjin Zhang; Lina Zhang; Rui Gao
Journal:  Biomed Res Int       Date:  2018-02-07       Impact factor: 3.411

5.  Identification of Cancerlectins Using Support Vector Machines With Fusion of G-Gap Dipeptide.

Authors:  Lili Qian; Yaping Wen; Guosheng Han
Journal:  Front Genet       Date:  2020-04-03       Impact factor: 4.599

6.  Hybrid approach for predicting coreceptor used by HIV-1 from its V3 loop amino acid sequence.

Authors:  Ravi Kumar; Gajendra P S Raghava
Journal:  PLoS One       Date:  2013-04-15       Impact factor: 3.240

7.  Predicting cancerlectins by the optimal g-gap dipeptides.

Authors:  Hao Lin; Wei-Xin Liu; Jiao He; Xin-Hui Liu; Hui Ding; Wei Chen
Journal:  Sci Rep       Date:  2015-12-09       Impact factor: 4.379

8.  Support vector machine (SVM) based multiclass prediction with basic statistical analysis of plasminogen activators.

Authors:  Selvaraj Muthukrishnan; Munish Puri; Christophe Lefevre
Journal:  BMC Res Notes       Date:  2014-01-27

9.  Harnessing the evolutionary information on oxygen binding proteins through Support Vector Machines based modules.

Authors:  Selvaraj Muthukrishnan; Munish Puri
Journal:  BMC Res Notes       Date:  2018-05-11

10.  iAcety-SmRF: Identification of Acetylation Protein by Using Statistical Moments and Random Forest.

Authors:  Sharaf Malebary; Shaista Rahman; Omar Barukab; Rehab Ash'ari; Sher Afzal Khan
Journal:  Membranes (Basel)       Date:  2022-02-25
  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.