| Literature DB >> 31874616 |
Ye Wang1, Changqing Mei1, Yuming Zhou1, Yan Wang1, Chunhou Zheng2, Xiao Zhen3, Yan Xiong4, Peng Chen5, Jun Zhang6, Bing Wang7,8.
Abstract
BACKGROUND: The recognition of protein interaction sites is of great significance in many biological processes, signaling pathways and drug designs. However, most sites on protein sequences cannot be defined as interface or non-interface sites because only a small part of protein interactions had been identified, which will cause the lack of prediction accuracy and generalization ability of predictors in protein interaction sites prediction. Therefore, it is necessary to effectively improve prediction performance of protein interaction sites using large amounts of unlabeled data together with small amounts of labeled data and background knowledge today.Entities:
Keywords: Conservative feature; Protein interaction site; Semi-supervised support vector machine; Unlabeled information
Mesh:
Substances:
Year: 2019 PMID: 31874616 PMCID: PMC6929468 DOI: 10.1186/s12859-019-3274-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Classification performance evaluation of three Semi-supervised methods on datasets
Fig. 2Prediction performance measures in 5 repetitions of cross-validation
Fig. 3Comparison of experimental results between SVM and S4VM
Fig. 4Compared with the former’s evaluation performance
Fig. 5Experimental visualization results. a represents the protein chain 1A4Y_A(a and b) is its spherical representation. c is the 1A4Y protein chain after extraction of surface residues. d, e and f show the predicted results of Means3vm-mkl, Means3vm-iter and S4VM, and the green balls, red balls, yellow balls and blue balls represent the number of TP, TN, FP and FN, respectively
The number of predictions in TP, TN, FP and FN
| Samples | Results | ||||
|---|---|---|---|---|---|
| TP | TN | FP | FN | ||
| Means3vm-iter | 218 | 57 | 82 | 28 | 51 |
| Means3vm-mkl | 60 | 84 | 26 | 48 | |
| S4VM | 68 | 87 | 23 | 40 | |
The protein chains used in this work
| 1AY7_A | 1B6C_A | 1B7Y_B | 1AZS_B | 1B7Y_A | 1AVG_H | 1AZS_C | 1B6C_B |
|---|---|---|---|---|---|---|---|
| 1UDI_E | 1UGH_E | 1ZBD_A | 1UEA_A | 1UUZ_A | 1TCO_A | 3TGI_I | 1WQ1_G |
| 1HLU_P | 1IRA_Y | 1KKL_A | 1HWH_B | 1JSU_C | 1HLU_A | 1IRA_X | 1ITB_A |
| 1BDJ_B | 1BMQ_A | 1BRB_I | 1BGX_T | 1BP3_A | 1BDJ_A | 1BI7_A | 1BMQ_B |
| 1QBK_B | 1SMP_A | 7CEI_A | 1QBK_C | 1STF_E | 1PYT_B | 1SGP_E | 1SMP_I |
| 1FLT_Y | 1GLA_F | 1HJA_C | 1GFW_A | 4SGB_I | 1FLT_V | 1GFW_B | 1GLA_G |
| 1ABR_A | 1AHW_C | 1ATN_D | 1ABR_B | 1AK4_D | 1A4Y_A | 1ACB_I | 1AK4_A |
| 1BVK_A | 1CA0_B | 1D4V_B | 1BVK_C | 1D4V_A | 1BRS_A | 1BVN_P | 1CXZ_B |
| 2KAI_B | 2SIC_I | 3SGB_I | 2PCC_A | 2TEC_E | 1ZBD_B | 2PCC_B | 2SNI_I |
| 1DAN_U | 1E9H_B | 1FAP_B | 1DFJ_E | 1ETH_A | 1DAN_L | 1E96_A | 1EFU_B |
| 1L0Y_A | 1NOC_B | 1PYT_A | 1L0Y_B | 1PDK_B | 1KKL_H | 1MAH_A | 1PDK_A |
| 1GUA_B | 1STF_I | 1UEA_B |
Fig. 6Illustration of different classification boundaries of SVM which considers only labeled data, and S3VM which considers labeled and unlabeled data, where green balls denote positives, red ones are negatives, and blue ones are unlabeled samples