| Literature DB >> 18197968 |
Jonathan G Lees1, Robert W Janes.
Abstract
BACKGROUND: A number of sequence-based methods exist for protein secondary structure prediction. Protein secondary structures can also be determined experimentally from circular dichroism, and infrared spectroscopic data using empirical analysis methods. It has been proposed that comparable accuracy can be obtained from sequence-based predictions as from these biophysical measurements. Here we have examined the secondary structure determination accuracies of sequence prediction methods with the empirically determined values from the spectroscopic data on datasets of proteins for which both crystal structures and spectroscopic data are available.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18197968 PMCID: PMC2253515 DOI: 10.1186/1471-2105-9-24
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
GSEQ dataset 3-fold cross-validation.
| 89.7 | 89.2 | 81.7 | |
| 0.770 | 0.676 | 0.631 | |
| 0.939 | 0.901 | 0.791 | |
| 0.047 | 0.047 | 0.059 | |
| δ | 0.073 | 0.069 | 0.083 |
Performance parameters from the 3-fold cross-validation of the GSEQ dataset using the neural network sequence-based prediction.
RASP46 dataset cross-validation.
| 0.944 | 0.074 | 0.916 | 0.067 | 0.819 | 0.077 | |
| 0.940 | 0.076 | 0.900 | 0.076 | 0.740 | 0.070 | |
| 0.955 | 0.066 | 0.938 | 0.060 | 0.858 | 0.068 | |
| 0.969 | 0.058 | 0.917 | 0.072 | 0.860 | 0.079 | |
| 0.970 | 0.055 | 0.956 | 0.056 | |||
| 0.974 | 0.053 | 0.939 | 0.062 | 0.846 | 0.072 | |
| 0.893 | 0.060 | |||||
Cross-validation prediction accuracy of the secondary structure content prediction of proteins in the RASP46 dataset. The performance parameters from CD, FTIR, sequence prediction (SEQ), and consensus methods are shown. The best performance parameters for each secondary structural type are shown in bold.
SP175 dataset cross-validation.
| 0.970 | 0.053 | 0.919 | 0.063 | 0.787 | 0.065 | |
| 0.972 | 0.052 | 0.918 | 0.068 | 0.864 | 0.070 | |
Cross-validation prediction accuracy of the secondary structure content prediction of proteins in the SP175 dataset. The performance parameters from CD, sequence based (SEQ), and consensus methods are shown. The best performance parameters for each secondary structural type are shown in bold.
Figure 1Composite FTIR and CD spectra. The CD spectra are in Δε units (y-axis) with wavelengths in nm (x-axis). The FTIR spectra are in arbitrary intensity units scaled as described in the methods (y-axis), with the wavenumbers in cm-1 (x-axis). A pair of hashed lines indicates the discontinuity point for the axes, and the black circle indicates the join between the CD and FTIR spectra.