| Literature DB >> 15345032 |
Xinghua Lu1, Chengxiang Zhai, Vanathi Gopalakrishnan, Bruce G Buchanan.
Abstract
BACKGROUND: Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, a much needed and important task is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO) project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15345032 PMCID: PMC517493 DOI: 10.1186/1471-2105-5-122
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
GO terms and PROSITE patterns for the protein MGI|MGI:97380
| GO:0005576 | Extracellular | PS00248 | Nerve growth factor family signature and profile (1) |
| GO:0005515: | Protein-binding | PS50270 | Nerve growth factor family signature and profile (1) |
| GO:0005166: | neurotrophin p75 receptor ligand | ||
| GO:0008544 | epidermal differentiation | ||
| GO:0007422 | peripheral nervous system development | ||
| GO:0007420 | brain development | ||
| GO:0007403 | |||
Five GO terms associated with PROSITE pattern PS00109 (tyrosine kinase signature)
| GO Term | GO Definition | NM | NT | NT-M | M.I. |
| GO:0004713 | Protein tyrosine kinase | 246 | 68 | 51 | 0.00599 |
| GO:0006468 | Protein amino acid phosphorylation | 246 | 409 | 69 | 0.00464 |
| GO:0004714 | Transmembrane receptor protein kinase | 246 | 33 | 29 | 0.00362 |
| GO:0004715 | Non-transmembrane protein tyrosine kinase | 246 | 17 | 14 | 0.00168 |
| GO:0005887 | Integral membrane protein | 246 | 1162 | 44 | 0.00118 |
Figure 1Correlation of mutual information cutoff and term assignment precision. Different M.I. cutoff value is used to assign GO terms to motifs. The precision of assignment is plotted vs M.I. cutoff value. The Pearson correlation coefficient between the precision and the cutoff is 0.837.
Figure 2Assigning GO term to motif according to rank of M.I. A. ROC curves of assigning GO terms to motifs according to rank of mutual information. The filled circle is for the perfect match data set, and the area under the curve is 0.782. The empty triangle is for the relaxed match data set, and the area under the curve is 0.735. The numbers next to data points indicate cut-off ranks of decision rules. Diagonal line corresponds to random model. B. Precision of rules based on different mutual information cutoff ranks. Filled bars are results on the perfect match data set. Empty bars are the results on the relaxed match data set.
Figure 3The algorithm for feature selection.
Result of logistic regression parameters estimation
| Estimated Coefficients | ||
| Features | On the perfect match set | On the relaxed match set |
| Intercept ( | -1.7549 | -0.6263 |
| NT-M( | -0.3845 | -0.4546 |
| T ( | 1.3652 | 1.6827 |
| NM ( | 1.0497 | 0.4735 |
| NG|M ( | -1.9792 | -1.1113 |
| NP|T ( | -1.7883 | -2.5494 |
| M.I. ( | 1.0002 | 1.1598 |
Top 5 GO terms associated with the motif PS000383 ranked according the conditional probability of correctness of association. The column 2~7 consist of the feature vector for motif-GO association is listed, the conditional probability p(Y = 1|) is calculated with trained model and the true classes are list in right two columns of the table. The definition of the GO terms is listed at the bottom of the table.
| GO Terms | Input Features ( | p( | True Class | |||||
| NT-M | NT | NM | NG|M | NP|T | M.I | |||
| GO:0006470 | 65 | 122 | 191 | 95 | 27 | 0.007268 | 0.97577 | 1 |
| GO:0005001 | 23 | 24 | 191 | 95 | 5 | 0.003177 | 0.87681 | 1 |
| GO:0004726 | 10 | 11 | 191 | 95 | 10 | 0.001331 | 0.68682 | 1 |
| GO:0005634 | 11 | 1785 | 191 | 95 | 281 | 5.08E-06 | 0.27882 | 0 |
| GO:0005887 | 17 | 1162 | 191 | 95 | 225 | 0.000193 | 0.15536 | 0 |
GO:0006470: Protein amino acid dephosphorylation; Cellular Processes
GO:0005001: transmembrane receptor protein tyrosine phosphatase activity: Molecular Function
GO:0004726: non-membrane spanning protein tyrosine phosphatase activity: Molecular Function
GO:0005634: Nucleus; Cellular Component
GO:0005887: Integral to plasma memberane; Cellular Component
Figure 4Comparison of results with probability and M.I. A. ROC curves for classifying motif-term associations at different probability threshold. Filled circles are the results on the perfect match test set with an area under curve of 0.8715. Empty triangles are on the relaxed match test set with an area under curve of 0.871. Data points correspond to thresholds of p(Y = 1|) from 0.9 to 0.1 (from left to right) with a step of 0.1. B. Precision (positive predictive value) at different probability cutoffs, where solid bars are the result on the perfect match test set and the open bars for the relaxed match test set. C. ROC curve for decision rules based on different M.I. cutoff thresholds with an area under curve of 0.816. D. Precision at different M.I. cutoffs.