Literature DB >> 33816926

Enhancement of conformational B-cell epitope prediction using CluSMOTE.

Binti Solihah1,2, Azhari Azhari1, Aina Musdholifah1.   

Abstract

BACKGROUND: A conformational B-cell epitope is one of the main components of vaccine design. It contains separate segments in its sequence, which are spatially close in the antigen chain. The availability of Ag-Ab complex data on the Protein Data Bank allows for the development predictive methods. Several epitope prediction models also have been developed, including learning-based methods. However, the performance of the model is still not optimum. The main problem in learning-based prediction models is class imbalance.
METHODS: This study proposes CluSMOTE, which is a combination of a cluster-based undersampling method and Synthetic Minority Oversampling Technique. The approach is used to generate other sample data to ensure that the dataset of the conformational epitope is balanced. The Hierarchical DBSCAN algorithm is performed to identify the cluster in the majority class. Some of the randomly selected data is taken from each cluster, considering the oversampling degree, and combined with the minority class data. The balance data is utilized as the training dataset to develop a conformational epitope prediction. Furthermore, two binary classification methods, Support Vector Machine and Decision Tree, are separately used to develop model prediction and to evaluate the performance of CluSMOTE in predicting conformational B-cell epitope. The experiment is focused on determining the best parameter for optimal CluSMOTE. Two independent datasets are used to compare the proposed prediction model with state of the art methods. The first and the second datasets represent the general protein and the glycoprotein antigens respectively. RESULT: The experimental result shows that CluSMOTE Decision Tree outperformed the Support Vector Machine in terms of AUC and Gmean as performance measurements. The mean AUC of CluSMOTE Decision Tree in the Kringelum and the SEPPA 3 test sets are 0.83 and 0.766, respectively. This shows that CluSMOTE Decision Tree is better than other methods in the general protein antigen, though comparable with SEPPA 3 in the glycoprotein antigen. ©2020 Solihah et al.

Entities:  

Keywords:  Class imbalance; Cluster-based undersampling; Hierarchical DBSCAN; Hybrid sampling; SMOTE; Vaccine design

Year:  2020        PMID: 33816926      PMCID: PMC7924438          DOI: 10.7717/peerj-cs.275

Source DB:  PubMed          Journal:  PeerJ Comput Sci        ISSN: 2376-5992


  38 in total

1.  CX, an algorithm that identifies protruding atoms in proteins.

Authors:  Alessandro Pintar; Oliviero Carugo; Sándor Pongor
Journal:  Bioinformatics       Date:  2002-07       Impact factor: 6.937

2.  Coupling Graphs, Efficient Algorithms and B-Cell Epitope Prediction.

Authors:  Steven C H Hoi; Zhenhua Li; Limsoon Wong; Hung Nguyen; Jinyan Li
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2014 Jan-Feb       Impact factor: 3.710

3.  An amino acid has two sides: a new 2D measure provides a different view of solvent exposure.

Authors:  Thomas Hamelryck
Journal:  Proteins       Date:  2005-04-01

4.  Conservation and prediction of solvent accessibility in protein families.

Authors:  B Rost; C Sander
Journal:  Proteins       Date:  1994-11

5.  Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.

Authors:  W Kabsch; C Sander
Journal:  Biopolymers       Date:  1983-12       Impact factor: 2.505

6.  Structural analysis of B-cell epitopes in antibody:protein complexes.

Authors:  Jens Vindahl Kringelum; Morten Nielsen; Søren Berg Padkjær; Ole Lund
Journal:  Mol Immunol       Date:  2012-07-10       Impact factor: 4.407

7.  Identification of conformational B-cell Epitopes in an antigen from its primary sequence.

Authors:  Hifzur Rahman Ansari; Gajendra Ps Raghava
Journal:  Immunome Res       Date:  2010-10-20

8.  CEP: a conformational epitope prediction server.

Authors:  Urmila Kulkarni-Kale; Shriram Bhosle; A S Kolaskar
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

9.  Prediction of antigenic epitopes on protein surfaces by consensus scoring.

Authors:  Shide Liang; Dandan Zheng; Chi Zhang; Martin Zacharias
Journal:  BMC Bioinformatics       Date:  2009-09-22       Impact factor: 3.169

10.  Tertiary structure-based prediction of conformational B-cell epitopes through B factors.

Authors:  Jing Ren; Qian Liu; John Ellis; Jinyan Li
Journal:  Bioinformatics       Date:  2014-06-15       Impact factor: 6.937

View more
  1 in total

1.  A technique for parallel query optimization using MapReduce framework and a semantic-based clustering method.

Authors:  Elham Azhir; Nima Jafari Navimipour; Mehdi Hosseinzadeh; Arash Sharifi; Aso Darwesh
Journal:  PeerJ Comput Sci       Date:  2021-06-01
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.