| Literature DB >> 25073475 |
Kenneth Fechter, Aleksey Porollo1.
Abstract
BACKGROUND: Cytochrome P450 monooxygenases (CYPs) represent a large and diverse family of enzymes involved in various biological processes in humans. Individual genome sequencing has revealed multiple mutations in human CYPs, and many missense mutations have been associated with variety of diseases. Since 3D structures are not resolved for most human CYPs, there is a need for a reliable sequence-based prediction that discriminates benign and disease causing mutations.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25073475 PMCID: PMC4119178 DOI: 10.1186/1755-8794-7-47
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Human CYP variants used for the control set CS30
| O75881|CYP7B1 | T297A | Hereditary spastic paraplegia; Liver failure | [ |
| A394D | Hereditary spastic paraplegia; Liver failure | [ | |
| R417C | Hereditary spastic paraplegia; Liver failure | [ | |
| F470I | Hereditary spastic paraplegia; Liver failure | [ | |
| R486C | Hereditary spastic paraplegia; Liver failure | [ | |
| P08686|CYP21A2 | V139E | Congenital adrenal hyperplasia | [ |
| T295N | Congenital adrenal hyperplasia | [ | |
| W302R | Congenital adrenal hyperplasia | [ | |
| L353R | Congenital adrenal hyperplasia | [ | |
| G375S | Congenital adrenal hyperplasia | [ | |
| F404S | Congenital adrenal hyperplasia | [ | |
| L446P | Congenital adrenal hyperplasia | [ | |
| T450P | Congenital adrenal hyperplasia | [ | |
| A265V | Neutral | [ | |
| P15538|CYP11B1 | M88I | Congenital adrenal hyperplasia | [ |
| W116G | Congenital adrenal hyperplasia | [ | |
| P159L | Congenital adrenal hyperplasia | [ | |
| A165D | Congenital adrenal hyperplasia | [ | |
| R366C | Congenital adrenal hyperplasia | [ | |
| R384Q | Congenital adrenal hyperplasia | [ | |
| T401A | Congenital adrenal hyperplasia | [ | |
| O15528|CYP27B1 | G57V | Pseudovitamin D-deficiency rickets | [ |
| G73W | Pseudovitamin D-deficiency rickets | [ | |
| L333F | Pseudovitamin D-deficiency rickets | [ | |
| R432C | Pseudovitamin D-deficiency rickets | [ | |
| R459C | Pseudovitamin D-deficiency rickets | [ | |
| R492W | Pseudovitamin D-deficiency rickets | [ | |
| G102E | Vitamin D-dependent rickets type 1 | [ | |
| P143L | Pseudovitamin D-deficiency rickets | [ | |
| D164N | Pseudovitamin D-deficiency rickets | [ |
Figure 1Distribution of the features used in the final prediction model over benign and deleterious mutations. A. Abs_dSS – absolute difference between similarity scores of wild type amino acid and mutation for a given position. B. ss_Abs_dSize – absolute difference between sizes of wild type amino acid and mutation weighted by the difference of the corresponding similarity scores. C. zsEntropy21 – Z-score for Shannon entropy at a given position based on a window of 21 neighboring amino acids. D. predRSA – predicted RSA. E. varPredRSA21 – variance of predicted RSA for the window of 21 neighboring amino acids. Whiskers indicate minimal and maximal values of a given feature.
Features passed the inclusion criteria and used for the final prediction model
| Abs_dSS | 0.73 | 0.73 | −0.72 | −0.38 | −0.32 |
| ss_Abs_dSize | 0.61 | | −0.50 | −0.28 | −0.31 |
| zsEntropy21 | 0.49 | | | 0.39 | 0.14 |
| predRSA | 0.47 | | | | 0.42 |
| varPredRSA21 | 0.45 | ||||
Performance of the prediction models on the training set TS270
| LDA 5-fold CV | 0.54 ± 0.04 | 82.96 ± 3.19 | 94.47 ± 1.65 | 84.17 ± 4.96 |
| NN 5-fold CV | 0.46 ± 0.10 | 79.26 ± 4.12 | 87.24 ± 6.58 | 84.87 ± 2.20 |
| LDA-cons | 0.53 | 82.59 | 92.89 | 84.72 |
| NN-cons | 0.53 | 81.85 | 89.34 | 86.27 |
| PolyPhen2/HumVar | 0.61 | 84.07 | 86.80 | 90.96 |
| PolyPhen2/HumDiv | 0.58 | 83.70 | 90.36 | 87.68 |
| SIFT | 0.49 | 76.33 | 77.70 | 85.71 |
LDA – linear model based on linear discriminant analysis.
NN – non-linear model based on neural networks.
LDA-cons and NN-cons – consensus models based on simple majority voting of 5 LDA or NN based models.
Performance of the evaluated methods on the training (TS270) and control (CS30) sets
| TS270 | MutaCYP | 55 | 18 | 13 | 184 | 0.69 | 0.67 | 0.58 |
| PolyPhen-2 HumVar | 56 | 17 | 26 | 171 | | 0.96 | 0.66 | |
| PolyPhen-2 HumDiv | 48 | 25 | 19 | 178 | | | 0.67 | |
| SIFT | 50 | 18 | 31 | 108 | | | | |
| CS30 | MutaCYP | 1 | 0 | 0 | 29 | 0.38 | 0.19 | 0.12 |
| PolyPhen-2 HumVar | 1 | 0 | 3 | 26 | | 0.94 | 0.58 | |
| PolyPhen-2 HumDiv | 0 | 1 | 1 | 28 | | | 0.67 | |
| SIFT | - | - | 5 | 16 | ||||
aSIFT predictions miss 63 mutations in TS270 (58 deleterious and 5 benign) and 9 mutations in CS30 (8 deleterious and 1 benign).
bConfusion scores notation: B-B – the number of benign mutations predicted as benign; B-D – benign as deleterious, D-D – deleterious as deleterious; D-B – deleterious as benign.
Confusion scores are computed for binary classification. Pearson correlation coefficient is computed for real valued predictions.
Figure 2ROC curves for predictions by the evaluated methods on the TS270 dataset.
Performance of the evaluated methods on the blind set (BS292)
| MutaCYP | 115 | 177 | 0.48 | 0.48 | 0.34 |
| PolyPhen-2 HumVar | 170 | 122 | | 0.96 | 0.47 |
| PolyPhen-2 HumDiv | 162 | 130 | | | 0.48 |
| SIFT | 161 | 124 | |||
aSIFT predictions miss 7 mutations.
Figure 3Overlap of predictions by the evaluated methods in TS270, CS30, and BS292 datasets.