| Literature DB >> 22372765 |
Tzong-Yi Lee1, Cheng-Tsung Lu, Shu-An Chen, Neil Arvin Bretaña, Tzu-Hsiu Cheng, Min-Gang Su, Kai-Yao Huang.
Abstract
BACKGROUND: Carboxylation is a modification of glutamate (Glu) residues which occurs post-translation that is catalyzed by γ-glutamyl carboxylase in the lumen of the endoplasmic reticulum. Vitamin K is a critical co-factor in the post-translational conversion of Glu residues to γ-carboxyglutamate (Gla) residues. It has been shown that the process of carboxylation is involved in the blood clotting cascade, bone growth, and extraosseous calcification. However, studies in this field have been limited by the difficulty of experimentally studying substrate site specificity in γ-glutamyl carboxylation. In silico investigations have the potential for characterizing carboxylated sites before experiments are carried out.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22372765 PMCID: PMC3278826 DOI: 10.1186/1471-2105-12-S13-S10
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Statistics of experimentally verified carboxylation sites in training data and independent testing data.
| Data set | |||
|---|---|---|---|
| Number of carboxylated proteins | 134 | 79 | 14 |
| Number of carboxylated glutamate residues | 463 | 302 | 60 |
| Number of non-carboxylated glutamate residues | 854 | 567 | 99 |
Figure 1Composition of amino acids at the vicinity of carboxylation sites with 15-mer window length (from -7 to +7). A) Frequency plot of amino acid composition at each flanking position. B) Position-specific differences in amino acid composition at the vicinity of carboxylation sites (top) and non-carboxylation sites (bottom).
Figure 2Structural characteristics of carboxylation sites with 15-mer window length (from -7 to +7). A) Comparison of average ASA percentage at carboxylation sites (blue line) and that at non-carboxylation sites (red line). B) Secondary structure around carboxylation sites.
Figure 3Five-fold cross-validation performance of models trained using amino acid composition with varying window lengths. To determine which window sizes can be used for generating the predictive model that best identifies carboxylation sites, a five-fold cross-validation is conducted to evaluate the models trained with different window lengths 2n+1, where n changes from four to ten.
Cross-validation performance of the predictive models trained with various features.
| Training features | Pre | Sn | Sp | Acc | MCC |
|---|---|---|---|---|---|
| Positional Weighted Matrix of flanking Amino Acids (AA_PWM) | 0.735 | 0.817 | 0.843 | 0.834 | 0.646 |
| Amino Acid Composition (AAC) | 0.696 | 0.798 | 0.814 | 0.808 | 0.596 |
| Accessible Surface Area (ASA) | 0.672 | 0.768 | 0.800 | 0.789 | 0.553 |
| Secondary structure (SS) | 0.580 | 0.718 | 0.723 | 0.721 | 0.424 |
| AA_PWM + AAC | 0.738 | 0.814 | 0.846 | 0.835 | 0.647 |
| AA_PWM + ASA | 0.781 | 0.831 | 0.876 | 0.860 | 0.698 |
| AA_PWM + SS | 0.709 | 0.791 | 0.827 | 0.814 | 0.604 |
| AA_PWM + AAC + SS | 0.711 | 0.801 | 0.827 | 0.818 | 0.613 |
| AA_PWM + AAC + ASA + SS | 0.812 | 0.860 | 0.894 | 0.882 | 0.745 |
Abbreviation: Pre, precision; Sn, sensitivity; Sp, specificity; Acc, accuracy; MCC, Matthews Correlation Coefficient.
Comparison of performances between cross-validation and independent testing.
| Five-fold cross validation | Independent testing | |
|---|---|---|
| Number of positive data | 302 | 60 |
| Number of negative data | 567 | 99 |
| True Positive | 260 | 50 |
| False Positive | 42 | 10 |
| True Negative | 516 | 81 |
| False Negative | 51 | 18 |
| Precision | 0.836 | 0.735 |
| Sensitivity | 0.860 | 0.833 |
| Specificity | 0.910 | 0.818 |
| Accuracy | 0.892 | 0.823 |
| Matthews Correlation Coefficient | 0.765 | 0.638 |
Statistics of InterPro functional annotations in 134 carboxylated proteins.
| InterPro ID | Type | Description | Number of carboxylated proteins |
|---|---|---|---|
| IPR000294 | Domain | Gamma-carboxyglutamic acid-rich (GLA) domain | 43 |
| IPR002383 | Domain | Coagulation factor, Gla domain | 21 |
| IPR001254 | Domain | Peptidase S1/S6, chymotrypsin/Hap | 19 |
| IPR009003 | Domain | Serine/cysteine peptidase, trypsin-like | 19 |
| IPR001314 | Family | Peptidase S1A, chymotrypsin | 18 |
| IPR013032 | Conserved site | EGF-like region, conserved site | 17 |
| IPR000742 | Domain | EGF-like, type 3 | 17 |
| IPR000152 | PTM | EGF-type aspartate/asparagine hydroxylation site | 17 |
| IPR006210 | Domain | EGF-like | 17 |
| IPR002384 | Family | Bone matrix, Gla protein | 17 |
| IPR018114 | Active site | Peptidase S1/S6, chymotrypsin/Hap, active site | 16 |
| IPR006209 | Domain | EGF | 16 |
| IPR018097 | Conserved site | EGF-like calcium-binding, conserved site | 15 |
| IPR012224 | Family | Peptidase S1A, coagulation factor VII/IX/X/C/Z | 15 |
| IPR001881 | Domain | EGF-like calcium-binding | 14 |