| Literature DB >> 28334262 |
Yugo Shimizu1, Hiroyuki Ogata1, Susumu Goto1.
Abstract
MOTIVATION: Functional prediction of paralogs is challenging in bioinformatics because of rapid functional diversification after gene duplication events combined with parallel acquisitions of similar functions by different paralogs. Plant type III polyketide synthases (PKSs), producing various secondary metabolites, represent a paralogous family that has undergone gene duplication and functional alteration. Currently, there is no computational method available for the functional prediction of type III PKSs.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28334262 PMCID: PMC5870536 DOI: 10.1093/bioinformatics/btx112
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1Typical reaction scheme of type III PKSs. Three features, (i) variation in the acyl part (‘R1’) of the starter substrate, (ii) the length of the intermediate and (iii) the mechanism of intramolecular cyclization, are used to define reaction types (e.g. R-3m-L, Sb-4-C and L-5-A). Each reaction type is represented by three elements corresponding to each of the three features. (i) The first element (e.g. R, Sb or L) represents the structure of the acyl part: ring, R; short chain (C2 to C12), S; long chain (up to C26), L. Additional characters are used for specific acyl-groups: branched chain, b; carboxyl, c; hydroxyl, h; nitrogen, n. (ii) The second element (e.g. 3m, 4 or 5) represents the number of methylenecarbonyl units in the intermediate, indicated by ‘n’, and unusual extender substrates other than malonyl-CoA: methylmalonyl-CoA, m; ethylmalonyl-CoA, e; acetoacetyl-CoA, a; diketide-CoA, d. (iii) The third element (e.g. L, C or A) represents the mechanism of intramolecular cyclization: lactone, L; Claisen, C; aldol, A; no cyclization, X; nitrogen–carbon, n; miscellaneous, +
Fig. 2(A) Positions of Areas 1–4 in the whole protein sequence of M. sativa CHS2. (B) Area 1–4 parts in the MSA of known R-4-A-, R-4-C- and R-2-X-type plant type III PKSs. (C) Sequence logos for each reaction type and all three types together. Hydrophilic, neutral and hydrophobic residues are blue (DEKNQR), green (AGHPST), and black (CFILMVWY), respectively. The conservation of residues in MSAs was represented by sequence logos using WebLogo3 (http://weblogo.threeplusone.com/) (Color version of this figure is available at Bioinformatics online.)
Fig. 3The results of PCA on three HMM scores using Area 1 + 3 + 4 corresponding to three reaction types: R-4-A, R-4-C and R-2-X. The first and second principal components (PC1 and PC2) are plotted and correspond to 52.2% and 34.1% of the variance, respectively
Fig. 4The results of PCA on six scores: three HMM scores using Area 1 + 3 + 4 and three correlation scores for the reaction type-models, R-4-A, R-4-C and R-2-X. The first and second principal components (PC1 and PC2) are plotted and correspond to 47.9% and 28.0% of the variance, respectively
Classification results of LDA using three classifiers
| Query | Classifier (R-4-A) | Classifier (R-4-C) | Classifier (R-2-X) | pPAP | |||
|---|---|---|---|---|---|---|---|
| Yes | No | Yes | No | Yes | No | True | |
| R-4-A | 13 | 0 | 0 | 13 | 0 | 13 | 13 |
| R-4-C | 0 | 27 | 27 | 0 | 0 | 27 | 27 |
| R-2-X | 0 | 9 | 0 | 9 | 9 | 0 | 9 |
| Rn-2-n/Rn-4-Cn | 0 | 4 | 0 | 4 | 0 | 4 | 4 |
| R-4-C, R-2-X | 0 | 2 | 2 | 0 | 0 | 2 | 0 |
| R-3m-L | 0 | 1 | 1 | 0 | 0 | 1 | 0 |
| R-4-L | 1 | 1 | 1 | 1 | 0 | 2 | 0 |
| S-*-* | 0 | 12 | 1 | 11 | 0 | 12 | 11 |
| L-4-A/L-5-A | 0 | 6 | 0 | 6 | 0 | 6 | 6 |
| Lh-4-L | 0 | 6 | 0 | 6 | 0 | 6 | 6 |
Note: The numbers of correct prediction by pPAP are also shown in the rightmost column.
Fig. 5Decision rules for reaction type prediction of plant type III PKSs