| Literature DB >> 17331242 |
Allegra Via1, Pier Federico Gherardini, Enrico Ferraro, Gabriele Ausiello, Gianpaolo Scalia Tomba, Manuela Helmer-Citterich.
Abstract
BACKGROUND: False occurrences of functional motifs in protein sequences can be considered as random events due solely to the sequence composition of a proteome. Here we use a numerical approach to investigate the random appearance of functional motifs with the aim of addressing biological questions such as: How are organisms protected from undesirable occurrences of motifs otherwise selected for their functionality? Has the random appearance of functional motifs in protein sequences been affected during evolution?Entities:
Mesh:
Substances:
Year: 2007 PMID: 17331242 PMCID: PMC1821045 DOI: 10.1186/1471-2105-8-68
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Correlation and slope of PROSITE and reversed patterns
| db | pattern | C1 | C2 | slope | R2 |
| sprot | prosite | 0.75 | 0.96 | 0.877 ± 0.019 | 0.71 |
| reversed | 0.86 | 0.92 | 1.080 ± 0.023 | 0.71 | |
| human | prosite | 0.84 | 0.88 | 0.908 ± 0.031 | 0.62 |
| reversed | 0.86 | 0.89 | 1.055 ± 0.028 | 0.73 | |
| yeast | prosite | 0.72 | 0.87 | 1.008 ± 0.046 | 0.58 |
| reversed | 0.78 | 0.82 | 0.977 ± 0.049 | 0.54 |
(C1): Pearson correlation based on [9] (C2): Pearson correlation based on randomized datasets.
Figure 1The number of false positives . The mean number of matches of a pattern in N (N = 1000) outcomes of a biological sequence dataset randomization (λ) versus the number of false positives (FP) on the biological dataset. Each point corresponds to a PROSITE pattern. The two non-straight lines over and below the bisector delimitate the 95% confidence intervals around the line λ = FP and divide the plan into three regions: I, II and III. Fig. 1a and b (which is a zoom of the square area of Fig. 1a) display sprot100 data, whereas Fig. 1c and d represent human100 and yeast100 data, respectively.
The information content (IC)
| Pattern region | N | IC average | Q25 | Q75 | |
| I | sprot100 | 63 (5%) | 26.8 ± 0.7 | 23.3 | 30.7 |
| III | 36 (3%) | 20.3 ± 0.5 | 18.9 | 21.9 | |
| II | 1196 (92%) | 33.3 ± 0.3 | 26.4 | 37.4 | |
| II* | 333 | 24.2 ± 0.2 | 22.3 | 26.3 | |
| μ = FP = 0 | 863 | 36.9 ± 0.4 | 30.2 | 40.5 | |
| I | human100 | 20 | 24.5 ± 1.2 | 20.6 | 28.7 |
| III | 2 | 17.3 ± 0.5 | 17.1 | 17.5 | |
| II | 830 | 32.8 ± 0.4 | 25.2 | 37.4 | |
| II* | 150 | 21.9 ± 0.2 | 20.3 | 23.6 | |
| μ = FP = 0 | 680 | 35.2 ± 0.4 | 28.3 | 39.0 | |
| I | yeast100 | 10 | 24.4 ± 2.1 | 21.4 | 29.2 |
| III | 0 | - | - | - | |
| II | 598 | 31.9 ± 0.4 | 25.4 | 35.6 | |
| II* | 65 | 20.8 ± 0.3 | 19.5 | 21.8 | |
| μ = FP = 0 | 532 | 33.2 ± 0.4 | 26.9 | 36.5 |
Region II* is the region II without patterns for which λ = FP = 0; Q25 is the 25th percentile and Q75 is the 75th percentile; N is the number of patterns in the corresponding region.
The order propensity (OP) value (GlobPlot)
| Pattern region | dataset | N | OP average | Q25 | Q75 |
| I | sprot100 | 63 | 0.37 ± 0.03 | 0.18 | 0.50 |
| III | 31 | 0.25 ± 0.03 | 0.14 | 0.33 | |
| II* | 252 | 0.31 ± 0.02 | 0.13 | 0.43 | |
| I | human100 | 191 | 0.25 ± 0.06 | 0.03 | 0.30 |
| III | 2 | 0.14 ± 0.02 | 0.14 | 0.16 | |
| II* | 1161 | 0.22 ± 0.02 | 0.00 | 0.30 | |
| I | yeast100 | 10 | 0.41 ± 0.08 | 0.23 | 0.49 |
| III | 0 | - | - | - | |
| II* | 531 | 0.32 ± 0.04 | 0.17 | 0.39 |
N is the number of patterns. The number of patterns in region III and region II* differs from the number of patterns in the corresponding regions of table 2 because only for patterns with FP > 0 the OP value can be calculated.
Number of expected by chance and observed PROSITE patterns in regions I and III
| 95% | 99% | ||||||||
| I | p-value | III | p-value | I | p-value | III | p-value | ||
| sprot100 | exp | 18 | <0.0001 | 3 | <0.0001 | 4 | <0.0001 | 0.5 | <0.0001 |
| obs | 63 | 36 | 54 | 22 | |||||
| human100 | exp | 9 | 0.0012 | 0.6 | n.s. | 2 | 0.0002 | 0.1 | 0.0002 |
| obs | 20 | 2 | 9 | 3 | |||||
| yeast100 | exp | 5 | 0.0385 | 0.1 | n.s. | 1 | <0.000 | 0 | n.s. |
| obs | 10 | 0 | 7 | 1 | 0 | ||||
p-value is the p-value assigned to the number of outliers observed (obs) versus the number expected (exp) from the Poisson distribution and n.s. stands for p-value > 0.05.