| Literature DB >> 27581095 |
Junjie Chen1, Ren Long1, Xiao-Long Wang1,2, Bin Liu1,2,3, Kuo-Chen Chou3,4.
Abstract
Protein remote homology detection is an important task in computational proteomics. Some computational methods have been proposed, which detect remote homology proteins based on different features and algorithms. As noted in previous studies, their predictive results are complementary to each other. Therefore, it is intriguing to explore whether these methods can be combined into one package so as to further enhance the performance power and application convenience. In view of this, we introduced a protein representation called profile-based pseudo protein sequence to extract the evolutionary information from the relevant profiles. Based on the concept of pseudo proteins, a new predictor, called "dRHP-PseRA", was developed by combining four state-of-the-art predictors (PSI-BLAST, HHblits, Hmmer, and Coma) via the rank aggregation approach. Cross-validation tests on a SCOP benchmark dataset have demonstrated that the new predictor has remarkably outperformed any of the existing methods for the same purpose on ROC50 scores. Accordingly, it is anticipated that dRHP-PseRA holds very high potential to become a useful high throughput tool for detecting remote homology proteins. For the convenience of most experimental scientists, a web-server for dRHP-PseRA has been established at http://bioinformatics.hitsz.edu.cn/dRHP-PseRA/.Entities:
Mesh:
Year: 2016 PMID: 27581095 PMCID: PMC5007510 DOI: 10.1038/srep32333
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
The performance of various predictors on the benchmark dataset .
| Methods | ROC1 | ROC50 |
|---|---|---|
| PSI-BLAST | 0.7506 | 0.8008 |
| HHblits | 0.8409 | 0.8827 |
| Hmmer | 0.7894 | 0.7915 |
| Coma | 0.6989 | 0.7785 |
| PsePro-PSI-BLAST | 0.7851 | 0.8361 |
| PsePro-HHblits | 0.8238 | 0.8781 |
| PsePro-Hmmer | 0.8137 | 0.8302 |
| PsePro-Coma | 0.7345 | 0.8152 |
| dRHP-PseRA | 0.8314 | 0.8924 |
aRepresents the PSI-BLAST predictor combined with pseudo proteins.
bRepresents the HHblits predictor combined with pseudo proteins.
cRepresents the Hmmer predictor combined with pseudo proteins.
dRepresents the Coma predictor combined with pseudo proteins.
eRepresents the dRHP-PseRA method combining three predictors (PsePro-PSI-BLAST, PsePro-Hmmer, and HHblits) via a linear weighting rank aggregation approach.
Figure 1Pairwise comparison results of the four methods.
The coordinates of the points in the plot represent the ROC1 scores obtained by the two methods labeled near the axis.
Figure 2The correlation between weight values and performance of different methods.
Figure 3Comparisons of various methods.
The graph plots the percentage of sequences for which the method exceeds a given performance. The higher curve means the method performs better.
Figure 4A schematic drawing to show the dataset for protein remote homology detection.
For a query protein P in family , the aim is to find the proteins in the superfamily (gray circles).
Figure 5The flowchart of dRHP-PseRA.
Proteins are replaced by their corresponding pseudo proteins, and then fed into predictors for protein remote homology detection. Finally, the ranking lists generated by these predictors are combined via a linear weighting rank aggregation approach.