| Literature DB >> 25708243 |
Tamara Vasylenko, Yi-Fan Liou, Hong-An Chen, Phasit Charoenkwan, Hui-Ling Huang, Shinn-Ying Ho.
Abstract
BACKGROUND: Photosynthetic proteins (PSPs) greatly differ in their structure and function as they are involved in numerous subprocesses that take place inside an organelle called a chloroplast. Few studies predict PSPs from sequences due to their high variety of sequences and structues. This work aims to predict and characterize PSPs by establishing the datasets of PSP and non-PSP sequences and developing prediction methods.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25708243 PMCID: PMC4331707 DOI: 10.1186/1471-2105-16-S1-S8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The flowchart of the system design.
Summary of the three datasets consisting of training and test data.
| Dataset | Sequence identity (%) | Total | PSP | Non-PSP |
|---|---|---|---|---|
| PSPGO-TRN | 25 | 1038 | 519 | 519 |
| PSPGO-TST | 25 | 260 | 130 | 130 |
| ORI-TRN | 50 | 1980 | 990 | 990 |
| ORI-TST | 50 | 492 | 246 | 246 |
| ORIRW-TRN | 25 | 1000 | 500 | 500 |
| ORIRW-TST | 25 | 466 | 233 | 233 |
Figure 2Heat map of the PSP propensity scores of dipeptides.
The propensity scores and composition (%) of amino acids.
| Amino acid | PS protein | Composition of PS: A(%) | Composition of Non-PS: B(%) | Composition difference: A-B(%) |
|---|---|---|---|---|
| A-Ala | 522.90 (1) | 9.97 | 8.03 | 1.94 |
| F-Phe | 516.70 (2) | 5.12 | 3.68 | 1.43 |
| Y-Tyr | 498.90 (3) | 3.34 | 2.68 | 0.66 |
| I-Ile | 495.80 (4) | 6.36 | 5.41 | 0.95 |
| L-Leu | 484.90 (5) | 11.28 | 10.20 | 1.08 |
| G-Gly | 482.70 (6) | 7.70 | 6.95 | 0.75 |
| V-Val | 448.60 (7) | 7.45 | 6.73 | 0.72 |
| M-Met | 447.70 (8) | 2.77 | 2.26 | 0.51 |
| P-Pro | 429.30 (9) | 4.72 | 5.09 | -0.36 |
| W-Trp | 417.40 (10) | 1.29 | 1.14 | 0.15 |
| T-Thr | 413.60 (11) | 5.07 | 5.21 | -0.14 |
| S-Ser | 383.60 (12) | 6.79 | 7.40 | -0.61 |
| N-Asn | 376.10 (13) | 3.40 | 3.85 | -0.45 |
| H-His | 373.30 (14) | 1.75 | 2.33 | -0.58 |
| C-Cys | 371.10 (15) | 1.06 | 1.06 | -0.69 |
| K-Lys | 370.70 (16) | 4.58 | 5.51 | -0.93 |
| D-Asp | 358.80 (17) | 4.35 | 5.24 | -0.89 |
| R-Arg | 356.70 (18) | 4.80 | 5.78 | -0.98 |
| E-Glu | 350.90 (19) | 5.33 | 6.64 | -1.31 |
| Q-Gln | 313.10 (20) | 2.90 | 4.12 | -1.23 |
| R | 1.000 | 0.53 | 0.22 | 0.96 |
The total numbers of amino acids for PSPs and non-PSPs in PSPGO-TRN are 133,744 and 210,361, respectively.
Figure 3Structures of . PDB entry 4HHH. Molecular structures of the residues of active center are highlighted in solid yellow. Detailed residue positions are shown as spacefill. Figures were made by Swiss-PdbViewer 4.1.0.
Figure 4Structures of . PDB entry 2WX5. (A) Distributions of side chain hydrophobicities; (B) Hydrophobic patches highlighted. Side chains colored according to hydrophobicity palette with blue at the hydrophobicity extreme, green in intermadiate, red for hydrophilic side chains. Figures were made by Swiss-PdbViewer 4.1.0.
The hydrophobic core of the light harvesting polypeptide.
| Name | sequence | reference | Propensity score |
|---|---|---|---|
| LH1-b | VYMSGLWLFSAVAIVAHLAVYIW | [54] | 554.77 |
| LH-a | ALVGLATFLFVLALLIHFILLST | [54] | 518.91 |
| Cut-a | ALVGLATFLFVLALLIHFILLST | [54] | 518.91 |
| Type1 | ALVGLATFLFVLALLIHFILLST | [54] | 518.91 |
| LH-b | IFTSSILVFFGVAAFAHLLVWIW | [55] | 604.77 |
The mean score of 400 dipeptide propensity is 420.27 and the threshold of the SCMPSP classifier is 441.29.
Performance of established datasets as compared for various E-value cut-offs by BLSTP
| e-value | PSPGO-TST | ORI-TST | ORIRW-TST |
|---|---|---|---|
| 0.1 | 58.46% | 40.85% | 32.40% |
| 0.01 | 57.31% | 38.21% | 30.26% |
| 0.001 | 56.15% | 36.99% | 28.76% |
| 0.0001 | 53.46% | 35.98% | 27.47% |
| 0.00001 | 51.92% | 34.15% | 26.39% |
Comparison of the prediction accuracies (%) of PSP predictors.
| Classifier | PSGO-TRN | PSGO-TST | ORI-TRN | ORI-TST | ORIRW-TRN | ORIRW-TST | Mean |
|---|---|---|---|---|---|---|---|
| SCMPSP | 83.82 | 71.54 | 77.78 | 62.60 | 81.8 | 64.38 | 66.17 |
| SVM-AAC | 85.45 | 78.85 | 71.36 | 50.61 | 71.20 | 57.30 | 62.25 |
| SVM-DPC | 83.33 | 72.31 | 68.94 | 69.11 | 67.30 | 61.37 | 67.60 |
| SVM-AAindex | 79.19 | 75.38 | 71.40 | 71.03 | 71.36 | 71.14 | 72.52 |
| J48-AAC | 68.50 | 73.84 | 64.24 | 63.41 | 68.21 | 49.79 | 62.35 |
| J48-DPC | 63.20 | 61.92 | 55.25 | 51.22 | 55.70 | 59.44 | 57.53 |
| J48-AAindex | 65.03 | 70.38 | 62.50 | 62.02 | 61.46 | 68.09 | 66.83 |
| Bayers-AAC | 67.92 | 69.20 | 64.65 | 63.00 | 65.00 | 65.02 | 65.74 |
| Bayers-DPC | 65.41 | 67.31 | 58.28 | 57.32 | 58.80 | 60.73 | 61.79 |
| Bayes-AAindex | 66.09 | 64.62 | 62.70 | 60.30 | 62.73 | 57.11 | 60.68 |
| Mean | 72.79 | 70.54 | 65.71 | 61.06 | 66.37 | 61.44 | |
10 independent runs of the SCMPSP on PSPGO-TRN.
| # | Fitness Score | Train Accuracy (%) | Sensitivity | Specificity | Threshold |
|---|---|---|---|---|---|
| 1 | 0.9016 | 82.5626 | 0.7154 | 0.6615 | 459.0526 |
| 2 | 0.9097 | 82.0809 | 0.7231 | 0.6538 | 454.5208 |
| 3 | 0.9105 | 83.8150 | 0.6615 | 0.6308 | 460.6491 |
| 4 | 0.9022 | 82.9480 | 0.6692 | 0.7231 | 429.4792 |
| 5 | 0.9136 | 83.6224 | 0.6462 | 0.7385 | 465.0526 |
| 6 | 0.9051 | 82.6590 | 0.6700 | 0.5923 | 456.3793 |
| 7 | 0.9114 | 83.8150 | 0.7154 | 0.7154 | 441.2917 |
| 8 | 0.9046 | 82.4663 | 0.6615 | 0.6700 | 456.5833 |
| 9 | 0.9027 | 81.6956 | 0.7231 | 0.6846 | 441.4901 |
| 10 | 0.9088 | 82.5626 | 0.7308 | 0.6000 | 448.3220 |
| Mean | 0.9070 | 82.8227 | 0.6916 | 0.6670 | 451.2821 |
| STDV | 0.0043 | 0.7253 | 0.0325 | 0.0500 | 10.9653 |
The amino acids scores derived from SCMPSP and physicochemical properties selected by SCM-PCPs.
| Amino acid | PS protein | 1BLAS910101 Score (Rank) | 2WOLR810101 Score (Rank) | 3PUNT030101 Score (Rank) |
|---|---|---|---|---|
| A-Ala | 522.9 (1) | 0.62 (10) | 1.95 (5) | -0.17 (15) |
| F-Phe | 516.7 (2) | 1.00 (1) | -0.76 (6) | -0.41 (20) |
| Y-Tyr | 498.9 (3) | 0.88 (4) | -6.11 (13) | -0.09 (13) |
| I-Ile | 495.8 (4) | 0.94 (2) | 2.15 (3) | -0.28 (18) |
| L-Leu | 484.9 (5) | 0.94 (3) | 2.28 (2) | -0.28 (19) |
| G-Gly | 482.7 (6) | 0.50 (11) | 2.39 (1) | 0.01 (10) |
| V-Val | 448.6 (7) | 0.83 (6) | 1.99 (4) | -0.17 (16) |
| M-Met | 447.7 (8) | 0.74 (7) | -1.48 (8) | -0.26 (17) |
| P-Pro | 429.3 (9) | 0.71 (8) | -3.68 (9) | 0.13 (7) |
| W-Trp | 417.4 (10) | 0.88 (5) | -5.88 (12) | -0.15 (14) |
| T-Thr | 413.6 (11) | 0.45 (12) | -4.88 (10) | 0.02 (9) |
| S-Ser | 383.6 (12) | 0.36 (13) | -5.06 (11) | 0.05 (8) |
| N-Asn | 376.1 (13) | 0.24 (16) | -9.68 (16) | 0.18 (5) |
| H-His | 373.3 (14) | 0.17 (17) | -10.27 (18) | -0.02 (11) |
| C-Cys | 371.1 (15) | 0.68 (9) | -1.24 (7) | -0.06 (12) |
| K-Lys | 370.7 (16) | 0.28 (14) | -9.52 (15) | 0.32 (3) |
| D-Asp | 358.8 (17) | 0.038 (19) | -10.95 (19) | 0.37 (1) |
| R-Arg | 356.7 (18) | 0.00 (20) | -19.92 (20) | 0.37 (2) |
| E-Glu | 350.9 (19) | 0.04 (18) | -10.20 (17) | 0.15 (6) |
| Q-Gln | 313.1 (20) | 0.25 (15) | -9.38 (14) | 0.26 (4) |
| R | 1.000 | 0.7955 | 0.76 | -0.79 |
1BLAS910101 = Scaled side chain hydrophobicity values (Black-Mould, 1991).
2WOLR810101 = Hydration potential (Wolfenden et al., 1981).
3PUNT030101 = Knowledge-based membrane-propensity scale from 1D_Helix in MPtopo databases (Punta-Maritan, 2003).
Figure 5The structures of light harvesting peptide, LH1, and cofactor, bacteriochlorophyll A. The peptide structure is extracted from the light harvesting complex containing 64 peptides, 4JC9. The cofactors (yellow sticks) contacting with it are also extracted. Red sticks denote the hydrophobic core residues. The colors of the surface from brown (the most hydrophobic) to blue (the most hydrophilic) indicate the hydrophobicity. The graph is generated from the Discovery Studio 4.0.
Figure 6A schematic representation of the photosynthetic apparatus in the thylacoid membrane.
The SCMPSP scores and Rate constants by Davies et al. [48]
| Amino acid | PS protein | |
|---|---|---|
| A-Ala | 522.9 (1) | 7.7 × 107 |
| F-Phe | 516.7 (2) | 6.5 × 109 |
| Y-Tyr | 498.9 (3) | 1.3 × 1010 |
| I-Ile | 495.8 (4) | 1.8 × 109 |
| L-Leu | 484.9 (5) | 1.7 × 109 |
| G-Gly | 482.7 (6) | 1.7 × 107 |
| V-Val | 448.6 (7) | 7.6 × 108 |
| M-Met | 447.7 (8) | 8.3 × 109 |
| P-Pro | 429.3 (9) | 4.8 × 108 |
| W-Trp | 417.4 (10) | 1.3 × 1010 |
| T-Thr | 482.7 (11) | 5.1 × 108 |
| S-Ser | 448.6 (12) | 3.2 × 108 |
| N-Asn | 447.7 (13) | 4.9 × 107 |
| H-His | 429.3 (14) | 1.3 × 1010 |
| C-Cys | 417.4 (15) | 3.4 × 1010 |
| K-Lys | 370.7 (16) | 3.4 × 108 |
| D-Asp | 358.8 (17) | 7.5 × 107 |
| R-Arg | 356.7 (18) | 3.5 × 109 |
| E-Glu | 350.9 (19) | 2.3 × 108 |
| Q-Gln | 313.1 (20) | 5.4 × 108 |
| R1a | 1.00 | 0.31 |
a. all amino acid residues
b. excluding the amino acids lacking the side chain effect