| Literature DB >> 22408192 |
Noah M Daniels1, Raghavendra Hosur, Bonnie Berger, Lenore J Cowen.
Abstract
MOTIVATION: One of the most successful methods to date for recognizing protein sequences that are evolutionarily related has been profile hidden Markov models (HMMs). However, these models do not capture pairwise statistical preferences of residues that are hydrogen bonded in beta sheets. These dependencies have been partially captured in the HMM setting by simulated evolution in the training phase and can be fully captured by Markov random fields (MRFs). However, the MRFs can be computationally prohibitive when beta strands are interleaved in complex topologies. We introduce SMURFLite, a method that combines both simplified MRFs and simulated evolution to substantially improve remote homology detection for beta structures. Unlike previous MRF-based methods, SMURFLite is computationally feasible on any beta-structural motif.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22408192 PMCID: PMC3338012 DOI: 10.1093/bioinformatics/bts110
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.A closed beta barrel (PDB ID 1bw3, a Barwin domain) from the superfamily ‘Barwin-like endoglucanases’ to illustrate interleaving of strand pairs. Beta strands a and b, which close the barrel, have interleave 4, whereas strands c and d, which are adjacent in sequence, have interleave 1. Strands b and c have interleave 2.
AUC on beta-propeller folds. Best AUC for each structure is marked in bold
| HMMER | RAPTOR | HHPred | SMURF-Lite 1 | SMURF-Lite 1, SimEv | SMURF-Lite 2 | SMURF-Lite 2, SimEv | SMURF-Lite 3 | SMURF-Lite 3, SimEv | |
|---|---|---|---|---|---|---|---|---|---|
| 5-bladed | – | – | – | 0.75 | 0.73 | 0.73 | |||
| 6-bladed | 0.82 | 0.82 | 0.88 | 0.92 | 0.93 | 0.95 | |||
| 7-bladed | 0.89 | 0.95 | 0.92 | 0.91 | 0.93 | 0.91 | 0.93 | 0.91 | |
| 8-bladed | – | 0.64 |
Note: for SmurfLite, the number (1,2,3) indicates the interleave threshold, and SimEv is simulated evolution. A dash (‘–’) in a result entry indicates the method failed on these structures, i.e. an AUC of <0.6.
AUC on beta-barrel superfamilies
| HMMER | RAPTOR | HHPred | SMURF-Lite 1 | SMURF-Lite 1, SimEv | SMURF-Lite 2 | SMURF-Lite 2, SimEv | |
|---|---|---|---|---|---|---|---|
| SMURFLite performs best | |||||||
| Translation proteins | – | – | 0.66 | 0.92 | |||
| Barwin-like endoglucanases | – | – | 0.75 | – | – | 0.63 | |
| Cyclophilin-like | 0.67 | 0.61 | 0.7 | 0.82 | 0.82 | 0.83 | |
| Sm-like ribonucleoproteins | 0.73 | 0.71 | 0.77 | 0.76 | 0.71 | 0.76 | |
| Prokaryotic SH3-related domain | 0.81 | – | – | 0.82 | |||
| Tudor/PWWP/MBT | 0.78 | 0.74 | 0.67 | 0.83 | 0.77 | 0.79 | |
| Nucleic acid-binding proteins | 0.75 | – | 0.67 | 0.76 | 0.89 | 0.76 | 0.92 |
| HHPred performs best | |||||||
| Translation proteins SH3-like | 0.83 | 0.81 | 0.62 | – | 0.62 | – | |
| RAPTOR performs best | |||||||
| PDZ domain-like | 0.96 | 0.99 | 0.97 | 0.97 | 0.97 | 0.97 | |
| FMN-binding split barrel | 0.62 | 0.61 | – | – | – | – | |
| HMMER performs best | |||||||
| Electron Transport accessory proteins | – | 0.77 | 0.63 | – | 0.63 | 0.66 |
Note: for SmurfLite, the number (1,2) indicates the interleave threshold, and SimEv is simulated evolution. A dash (‘–’) in a result entry indicates the method failed on these structures, i.e. an AUC of <0.6.
Fig. 2.Performance of SMURFLite compared with other methods on the ‘Barwin-like endoglucanases’ beta-barrel superfamily according to the AUC measure. For SMURFLite, the number (1,2,4) indicates the interleave threshold (indicating which strand pairs in the barrel participate in the MRF; note that interleave 3 is omitted since it is identical to interleave 2 for this fold), and SimEv indicates that simulated evolution was also performed on the beta strands in the training data. As the interleave threshold increases and the MRF becomes more powerful, performance tends to improve. Including simulated evolution also improves performance.