| Literature DB >> 22844408 |
Neil Arvin Bretaña1, Cheng-Tsung Lu, Chiu-Yun Chiang, Min-Gang Su, Kai-Yao Huang, Tzong-Yi Lee, Shun-Long Weng.
Abstract
Viruses infect humans and progress inside the body leading to various diseases and complications. The phosphorylation of viral proteins catalyzed by host kinases plays crucial regulatory roles in enhancing replication and inhibition of normal host-cell functions. Due to its biological importance, there is a desire to identify the protein phosphorylation sites on human viruses. However, the use of mass spectrometry-based experiments is proven to be expensive and labor-intensive. Furthermore, previous studies which have identified phosphorylation sites in human viruses do not include the investigation of the responsible kinases. Thus, we are motivated to propose a new method to identify protein phosphorylation sites with its kinase substrate specificity on human viruses. The experimentally verified phosphorylation data were extracted from virPTM--a database containing 301 experimentally verified phosphorylation data on 104 human kinase-phosphorylated virus proteins. In an attempt to investigate kinase substrate specificities in viral protein phosphorylation sites, maximal dependence decomposition (MDD) is employed to cluster a large set of phosphorylation data into subgroups containing significantly conserved motifs. The experimental human phosphorylation sites are collected from Phospho.ELM, grouped according to its kinase annotation, and compared with the virus MDD clusters. This investigation identifies human kinases such as CK2, PKB, CDK, and MAPK as potential kinases for catalyzing virus protein substrates as confirmed by published literature. Profile hidden Markov model is then applied to learn a predictive model for each subgroup. A five-fold cross validation evaluation on the MDD-clustered HMMs yields an average accuracy of 84.93% for Serine, and 78.05% for Threonine. Furthermore, an independent testing data collected from UniProtKB and Phospho.ELM is used to make a comparison of predictive performance on three popular kinase-specific phosphorylation site prediction tools. In the independent testing, the high sensitivity and specificity of the proposed method demonstrate the predictive effectiveness of the identified substrate motifs and the importance of investigating potential kinases for viral protein phosphorylation sites.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22844408 PMCID: PMC3402495 DOI: 10.1371/journal.pone.0040694
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Analytical flowchart.
The proposed method involves three major steps: data collection, motif detection, and model training and cross validation.
Statistics of data used for this study.
| Data Set | Source | Residue | Type | DataCount | Balanced Data |
|
|
|
| Positive | 233 | 233 |
| Negative | 2588 | 233 | |||
|
| Positive | 54 | 54 | ||
| Negative | 1170 | 54 | |||
|
| Positive | 14 | 14 | ||
| Negative | 65 | 65 | |||
|
|
|
| Positive | 24 | 24 |
|
| Negative | 217 | 24 | ||
|
| Positive | 10 | 10 | ||
| Negative | 159 | 10 | |||
|
|
| Positive | 2 | 2 | |
| Negative | 67 | 2 | |||
|
| Positive | 2 | 2 | ||
| Negative | 16 | 2 |
Figure 2pSer virus motif – human motif matches.
Figure 3pThr virus motif – human motif matches.
Five-Fold Cross Validation Results on Serine MDD-Clustered HMMs.
| Group | Number of positive data | HMMER bit score | Pre | Sn | Sp | Acc |
|
| 54 | −11 | 93.1% | 94.1% | 92.7% | 93.4% |
|
| 34 | −11 | 80.0% | 94.2% | 76.6% | 85.4% |
|
| 20 | −9 | 84.3% | 90.0% | 80.0% | 85.0% |
|
| 59 | −8 | 66.4% | 74.6% | 60.6% | 67.6% |
|
| 66 | −10 | 89.3% | 98.4% | 87.6% | 93.0% |
|
|
|
|
|
|
Abbreviations: Pre, precision; Sn, sensitivity; Sp, specificity; Acc, accuracy.
Five-Fold Cross Validation Results on Threonine MDD-Clustered HMMs.
| Group | Number of positive data | HMMER bit score | Pre | Sn | Sp | Acc |
|
| 19 | −10 | 92.0% | 100% | 90.0% | 95% |
|
| 16 | −11 | 43.3% | 50.0% | 43.3% | 46.6% |
|
| 19 | −10 | 95.0% | 90.0% | 95.0% | 92.5% |
|
|
|
|
|
|
Abbreviations: Pre, precision; Sn, sensitivity; Sp, specificity; Acc, accuracy.
Figure 4Comparison of five-fold cross validation performance.
(A) Comparison of 5-fold cross validation results between an S HMM which does not utilize prior MDD-clustering and S HMMs which utilize prior MDD-clustering. (B) Comparison of 5-fold cross validation results between a T HMM which does not utilize prior MDD-clustering and T HMMs which utilize prior MDD-clustering.
Independent Test Results of Serine MDD-clustered HMMs.
| Residue | MDDgroup | Threshold | Pre | Sn | Sp | Acc |
|
| −11 | 89.5% | 11.5% | 98.1% | 54.8% | |
|
| −11 | 65.3% | 34.6% | 80.0% | 57.3% | |
|
|
| −7 | 58.6% | 11.5% | 90.8% | 51.2% |
|
| −8 | 67.5% | 11.5% | 93.5% | 52.5% | |
|
| −10 | 72.6% | 26.9% | 89.2% | 58.1% | |
|
| 66.7% | 69.2% | 64.6% | 66.9% |
Abbreviations: Pre, precision; Sn, sensitivity; Sp, specificity; Acc, accuracy.
Figure 5Comparison of independent testing performance.
(A) Comparison of independent test results between an S HMM which does not utilize prior MDD-clustering and S HMMs which utilize prior MDD-clustering. (B) Comparison of independent test results between a T HMM which does not utilize prior MDD-clustering and T HMMs which utilize prior MDD-clustering.
Independent Test Results of Threonine MDD-clustered HMMs.
| Residue | MDDgroup | Threshold | Pre | Sn | Sp | Acc |
|
| −10 | 42.5% | 20.0% | 71.0% | 45.5% | |
|
|
| −6 | 88.4% | 50.0% | 92.0% | 71.0% |
|
| −10 | 80.7% | 40.0% | 89.0% | 64.5% | |
|
| 75.0% | 99.0% | 62.7% | 80.9% |
Abbreviations: Pre, precision; Sn, sensitivity; Sp, specificity; Acc, accuracy.
Summary of predicted phosphorylation sites on human viruses.
| Virus Name | Protein ID | Position | Predicted Kinase | Literature-annotated Kinase | Reference |
| HHV-5 | P18139 | S462 | CK2; CK2 Alpha; Model S2 | Unknown | |
| HIV-1 | P05923 | S56 | CK2; CK2 Alpha; Model S2 | CK2 |
|
| HTLV-1 | P0C205 | S70 | Model S2 | By Host(Unknown) |
|
| HIV-1 | P05923 | S52 | Model S2 | CK2 |
|
| HRSV | P12579 | S116 | Model S2 | By Host(Unknown) |
|
| HHV-4 | P03191 | S305 | Model S2 | Unknown | |
| HRSV | P12579 | S161 | Model S2 | By Host (Unknown) |
|
| HTLV-1 | P03345 | S105 | Model S2; PKB; CDK; MAPK | MAPK1; CDK |
|
| HHV-3 | P09258 | S343 | CDK; MAPK; Model S2 | Unknown | |
| HIV-1 | P69723 | S144 | PKB | Unknown | |
| HTLV-1 | P0C205 | S165 | PKB | Unknown | |
| HTLV-1 | P03409 | S336 | PKB; CDK; MAPK | CDK |
|
| HRSV | P12579 | S117 | PKB | By Host(Unknown); |
|
| HIV-1 | P05928 | S79 | Model S4 | By Host(Unknown) |
|
| HHV-5 | P69332 | S338 | Model S4 | Unknown | |
| HTLV-1 | P0C205 | S177 | Model S4 | Unknown | |
| HTLV-1 | P0C205 | S147 | Model S4 | Unknown | |
| HIV-1 | P05928 | S94 | Model S4 | By Host (Unknown) |
|
| HTLV-1 | P0C205 | S97 | CDK; MAPK | Unknown | |
| HHV-4 | P03191 | S337 | CDK; MAPK | Viral BGLF4 kinase |
|
| HIV-1 | P69718 | S99 | CDK; MAPK | By Host (Unknown) |
|
| HHV-4 | P03191 | S349 | CDK; MAPK | Viral BGLF4 kinase |
|
| HTLV-1 | P0C205 | S177 | CDK; MAPK | Unknown | |
| HHV-4 | P03191 | S121 | CDK; MAPK | Unknown | |
| HTLV-1 | P0C205 | T174 | CK2; CK2 Alpha | By Host (Unknown) |
|
| HHV-4 | P03191 | T344 | CK2; CK2 Alpha; CDK; MAPK | Viral BGLF4 kinase |
|
| HPV-16 | P06922 | T71 | CK2; CK2 Alpha | Unknown | |
| HTLV-1 | P03409 | T242 | CK2; CK2 Alpha | Unknown | |
| HTLV-1 | P03409 | T48 | Model T2 | Unknown | |
| HIV-1 | P69723 | T188 | Model T2 | Unknown | |
| HTLV-1 | P03409 | T215 | Model T2 | Unknown | |
| HTLV-1 | P0C205 | T174 | Model T2 | Unknown | |
| HTLV-1 | P03409 | T322 | Model T2 | Unknown | |
| HHV-1 | P06437 | T313 | Model T2 | Unknown | |
| HIV-1 | P69723 | T155 | CDK; MAPK | Unknown | |
| HHV-4 | P03191 | T355 | CDK; MAPK | Viral BGLF4 kinase |
|
| HPV-16 | P06922 | T57 | CDK; MAPK | ERK |
|
The summaries of human viruses and kinases are presented in Table S9 and S10, respectively.
Relation between human kinase and virus protein reported in literature.
Comparison of independent testing performance with other kinase-specific phosphorylation site prediction tools.
| Tools | MDD-clustered HMMs | PREDIKIN 2.0 | KinasePhos 2.0 | GPS 2.1 |
| Number of true positive predictions | 36 | 33 | 36 | 36 |
| Number of false positive predictions | 89 | 145 | 172 | 189 |
| Number of true negative predictions | 303 | 247 | 220 | 203 |
| Number of false negative predictions | 0 | 3 | 0 | 0 |
| Precision | 28.9% | 18.5% | 17.3% | 16.0% |
| Sensitivity | 100.0% | 91.7% | 100.0% | 100.0% |
| Specificity | 77.3% | 63.1% | 56.1% | 51.8% |
| Accuracy | 79.2% | 65.4% | 59.8% | 55.8% |