| Literature DB >> 20122242 |
Abstract
BACKGROUND: Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20122242 PMCID: PMC3009540 DOI: 10.1186/1471-2105-11-S1-S66
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Mixture of true signal motifs with false motifs severely reduced the conservation level of the motif regions.
Figure 2Structure of BayesMotif algorithm for anchored sorting motif discovery.
Figure 3False positive removal procedure. This example shows BayesMotif uses sliding windows to remove false motifs from the dataset. It works by averaging the score before consecutive 3 window score which has at least one less than 0.85 (marked as red) calculate the score for each sequence the sequences have score less than 0.85 will be removed from the training set.
Figure 4Implanted motifs. a) Simulated hydrophobic motifs anchored by AA; b) Simulated hydrophobic/positively charged motifs anchored by AA; c) No conserved motifs around anchors.
Synthetic motif datasets and real datasets
| Dataset | Number of positive samples | Number of negative samples | Anchors |
|---|---|---|---|
| Synthetic | 219 | 220 | ----AA---- (Artificial) |
| Translocation | 86 | 439 | ----RR---- |
| LDL receptor | 464 | 439 | ----NPXY---- |
Performance of BayesMotif for identifying implanted motifs
| Motif | Implanted Motif length | Detected motif length | Information content | Motif Score |
|---|---|---|---|---|
| Hydrophobic | 20 | 27.6 ± 2.4 | 33.1 ± 0.72 | 1.00 ± 0.0 |
| Hydrophobic+Charged | 20 | 27. 5 ± 3.0 | 27.9 ± 0 93 | 0. 98 ± 0.006 |
| Random | 20 | 25.4 ± 3.6 | 25.4 ± 0.85 | 0.79 ± 0.03 |
Benchmark results of BayesMotif without false positive removal. Data are artificially generated by simulating hydrophobic and charged regions around a fixed 2 amino acids long anchor.
| True motif Ratio | 10% | 20% | 40% | 60% | 80% | 100% |
|---|---|---|---|---|---|---|
| Motif Length | 26 ± 3.24 | 26 ± 3.20 | 26 ± 4.1 | 29 ± 1.13 | 29 ± 1.8 | 27 ± 3.0 |
| Motif Score | 0.55 ± 0.04 | 0.55 ± 0.02 | 0.59 ± 0.01 | 0.67 ± 0.02 | 0.80 ± 0.01 | 0.98 ± 0.006 |
| Motif Information | 15.9 ± 0.87 | 15.3 ± 0.94 | 15.5 ± 1.3 | 17.7 ± 0.43 | 21.5 ± 0.62 | 28.0 ± 0.93 |
Benchmark results of BayesMotif with false positive removal. Data are artificially generated by simulating hydrophobic and charged regions around a fixed 2 amino acids long anchor.
| True motif Ratio | 10% | 20% | 40% | 60% | 80% | 100% |
|---|---|---|---|---|---|---|
| Motif Length | 6 ± 0.83 | 17 ± 3.6 | 23 ± 1.2 | 23 ± 0.8 | 24 ± 0.7 | 27 ± 3.0 |
| Motif Score | 0.56 ± 0.11 | 0.79 ± 0.04 | 0.90 ± 0.01 | 0.95 ± 0.01 | 0.97 ± 0.007 | 0.98 ± 0.005 |
| Motif Information | 16.0 ± 1.1 | 23.9 ± 3.5 | 26.2 ± 1.2 | 26.3 ± 0.78 | 26.8 ± 0.39 | 27.9 ± 0.95 |
Figure 5Protein sorting motifs identified by the BayesMotif algorithm. a) Motif logo of TAT-Translocation signal peptide RRxFLK; b) motif logo of DGxD motif; c) motif log of GGPL and GDSG motif; d) motif log of putative Motif PGVY.
Results of BayesMotif with or without false positive removal on real datasets
| True motif Ratio | 10% | 20% | 40% | 50% | 80% | 100% | |
|---|---|---|---|---|---|---|---|
| Without False Positive Removal | Found by Algorithm | No | No | No | Yes | Yes | Yes |
| Motif Score | 0.55 ± 0.03 | 0.77 ± 0.01 | 0.91 ± 0.02 | ||||
| Motif Information | 19.0 ± 3.24 | 31.6 ± 0.5 | 39.5 ± 0.20 | ||||
| With False Positive Removal | Found by Algorithm | No | Yes | Yes | Yes | Yes | Yes |
| Motif Score | 0.63 ± 0.11 | 0.82 ± 0.04 | 0.83 ± 0.04 | 0.90 ± 0.02 | 0.92 ± 0.01 | ||
| Motif Information | 21.5 ± 1.3 | 36.4 ± 2.4 | 38.0 ± 2.4 | 41.2 ± 1.1 | 42.0 ± 0.49 |