| Literature DB >> 16704727 |
Nicolas Sapay1, Yann Guermeur, Gilbert Deléage.
Abstract
BACKGROUND: Membrane proteins are estimated to represent about 25% of open reading frames in fully sequenced genomes. However, the experimental study of proteins remains difficult. Considerable efforts have thus been made to develop prediction methods. Most of these were conceived to detect transmembrane helices in polytopic proteins. Alternatively, a membrane protein can be monotopic and anchored via an amphipathic helix inserted in a parallel way to the membrane interface, so-called in-plane membrane (IPM) anchors. This type of membrane anchor is still poorly understood and no suitable prediction method is currently available.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16704727 PMCID: PMC1564421 DOI: 10.1186/1471-2105-7-255
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Amino acid composition bias of IPM anchors, solvent-accessible helices from globular proteins and TM anchors. Amino acid frequencies were normalized to UniProt amino acid composition (dashed line). The composition of IPM anchors is shown in black, of TM helices in grey and of solvent-accessible helices from globular proteins in white. IPM anchors are extracted from our final data set. Solvent accessible helices are extracted from globular soluble proteins present in the PDB (sequence similarity lower than 25%, accessibility computed by DSSP [53] lower than or equal to 60). TM helices are extracted from the 3D_helix set of the MPtopo database [5].
Sequence-to-topology SVM performance using the LRG and PHAT matrices
| Substitution Matrix | LRGa | PHATb | LRGc | PHATd | MLPe |
| Accuracy | 94.0 | 93.6 | 93.9 | 94.3 | 90.6 |
| Sensitivity | 18.3 | 9.9 | 28.4 | 27.2 | 35.3 |
| Specificity | 99.8 | 100.0 | 98.9 | 99.4 | 94.8 |
| Pnon-IPM | 94.1 | 93.6 | 94.8 | 94.7 | 95.1 |
| PIPM | 87.1 | 94.4 | 67.0 | 76.3 | 34.2 |
| CPM | 0.38 | 0.30 | 0.41 | 0.44 | 0.30 |
a C = 5.0, 1/2σ2 = 0.03, window size = 21
b C = 5.0, 1/2σ2 = 0.01, window size = 21
c C = 25.0, 1/2σ2 = 0.40, window size = 21 residues
d C = 5.0, 1/2σ2 = 0.10, window size = 21 residues
e hidden layer size = 16, window size = 15 residues
Figure 2Positional weighting profiles associated with the LRG (dashed line) and PHAT (solid line) matrices.
Topology-to-topology SVM training and test performance using as input the output of the sequence-to-topology SVM. "With" and "Without Structure II" indicates if the predicted secondary structure of the sequence is also included in input or not. LRG and PHAT columns correspond to the substitution matrices used by the sequence-to-topology SVM.
| Substitution Matrix | LRGa | PHATb | LRGc | PHATd |
| Accuracy | 91.7 | 93.6 | 92.3 | 94.3 |
| Sensitivity | 42.9 | 64.3 | 41.1 | 44.1 |
| Specificity | 95.5 | 95.9 | 96.4 | 98.3 |
| Pnon-IPM | 95.4 | 97.1 | 95.3 | 95.6 |
| PIPM | 44.1 | 55.7 | 47.5 | 67.1 |
| CPM | 0.39 | 0.64 | 0.40 | 0.52 |
a C = 10.0, 1/2σ2 = 0.1, predictors = segment of 21 residues
b C = 10.0, 1/2σ2 = 0.1, predictors = segment of 21 residues
c C = 15.0, 1/2σ2 = 0.1, predictors = segment of 21 residues + corresponding predicted secondary structure
d C = 5.0, 1/2σ2 = 0.05, predictors = segment of 21 residues + corresponding predicted secondary structure.
Quality of the predictions involving multiple alignments. The weights assigned to the aligned sequences are calculated using a BLOSUM weight scheme at a fractional identity of 0.80. LRG and PHAT columns correspond to the substitution matrices used by the SVM.
| Substitution Matrix | LRG | PHAT |
| Accuracy | 95.0 | 95.0 |
| Sensitivity | 31.3 | 31.3 |
| Specificity | 99.8 | 99.8 |
| Pnon-IPM | 92.3 | 93.8 |
| PIPM | 95.0 | 95.0 |
| CPM | 0.52 | 0.53 |
Classification performance for 3 sets of soluble or transmembrane proteins naively tested. "Observed as" corresponds to the number of residues observed at a TM or a non-TM position. "Predicted as" corresponds to the number of residues predicted at a IPM or non-IPM position. "Proteins with TM α-helix" is a set of 101 proteins with 1 or more TM α-helices. "Proteins with TM β-barrel" is a set of 21 TM β-barrel proteins. TM proteins are extracted from the MPtopo database (3D_helix and 3D_other subsets, respectively). "Soluble proteins" is a set of 65 soluble proteins extracted from the PDB (sequence similarity < 25%). These 3 sets were submitted to the sequence-to-topology SVM, using PHAT and a positional weighting (Table 1). An average prediction was then computed for each sequence of the sets following the procedure described above (Table 3).
| TM | non-TM | TM | non-TM | TM | non-TM | |
| IPM | 181 | 152 | 16 | 5 | - | 57 |
| non-IPM | 11057 | 14423 | 3540 | 4138 | - | 30310 |
Figure 3Schematic representation to scale of an IPM anchor. The amphipathic α-helix of the IPM anchor is depicted as a black and white cylinder, for the hydrophobic and hydrophilic sides, respectively. The non-membrane part of the protein is represented by a dotted line. The membrane hydrophobic core, including acyl chains, is dark grey and the membrane interface, including glycerol and above atoms, is light grey.
Figure 4Flowchart of the data set enrichment process.