| Literature DB >> 26224486 |
Itziar Frades1, Svante Resjö2, Erik Andreasson3.
Abstract
BACKGROUND: How protein phosphorylation relates to kingdom/phylum divergence is largely unknown and the amino acid residues surrounding the phosphorylation site have profound importance on protein kinase-substrate interactions. Standard motif analysis is not adequate for large scale comparative analysis because each phophopeptide is assigned to a unique motif and perform poorly with the unbalanced nature of the input datasets.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26224486 PMCID: PMC4520095 DOI: 10.1186/s12859-015-0657-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Serine centered phosphopetide sequences of 21 length, n-grams of varying size (6 to 21 mer) and references from the datasets in each kingdom/phylum and species under study in the training set and the test set. For each species in the training set two datasets were used, and hence, two numbers are given. There were many more n-grams than phospho-sites, due to the window of phospho-sites (21) and varying length of n-grams within the sites
| Training set | Number of phosphopeptides | Number of n-grams | References | Kingdom | Phylum |
|---|---|---|---|---|---|
|
| 2903 and 4270 | 349724 and 527397 | [ | Plantae | |
|
| 1972 and 4075 | 200661 and 454563 | [ | Animalia | Chordata |
|
| 6363 and 6362 | 671933 and 596922 | [ | Animalia | Arthropoda |
|
| 6343 and 1178 | 712345 and 116095 | [ | Fungi | Ascomycota |
|
| 744 and 1048 | 93799 and 137899 | [ | Chromalveolata | Apicomplexa |
|
| 447 | 50007 | [ | Plantae | |
|
| 7372 | 811443 | [ | Animalia | Chordata |
|
| 4003 | 436055 | [ | Animalia | Nematoda |
|
| 1362 | 155639 | [ | Fungi | Ascomycota |
|
| 1388 | 172714 | [ | Chromalveolata | Apicomplexa |
Motif analysis of three individual phosphosite detection experiments in A. thailiana. The number of different motifs in each of the three individual experiment was computed and the resulting three numbers of motifs were summed (Sum of number of motifs from individual experiments). Additionally the motifs that are obtained by grouping the phosphosites from the three experiments (Sum of experiments) were computed
| Dataset: | Number of motifs | Number of phosphosites | Minimum number of occurences |
|---|---|---|---|
| Experiment1 | 116 | 1733 | 5 |
| Experiment2 | 3 | 178 | 5 |
| Experiment3 | 99 | 6862 | 21 |
| Sum of number of motifs from individual experiments | 182 | 8773 | 5;5;21 |
| Sum of experiments | 243 | 8773 | 27 |
Confusion matrix of the signature pairing equal kingdom/phylum species
|
|
|
|
|
| classified as |
|---|---|---|---|---|---|
| 1 | 0 | 0 | 0 | 0 |
|
| 0 | 1 | 0 | 0 | 0 |
|
| 0 | 0 | 1 | 0 | 0 |
|
| 0 | 0 | 0 | 1 | 0 |
|
| 0 | 0 | 0 | 0 | 1 |
|
Fig. 1Hierarchical cluster analysis of the normalized frequencies of the discriminative n-grams present in each speciesThe dendogram shows that different species having the same kingdom/phylum cluster together
Each species proportion of proteins having discriminative n-grams with no orthologs in other species, proteins having discriminative n-grams with orthologs only in a species of the same kingdom/phylum and proteins having discriminative n-grams with orthologs in other species
| Species | Each species proportion of discriminative n-grams with no orthologs in other species | Each species proportion of discriminative n-grams with orthologs only in species of the same kingdom/phylum | Each species proportion of discriminative n-grams with orthologs outside the kingdom/phylum |
|---|---|---|---|
|
| 24.4 % | 63.1 % | 12.5 % |
|
| 0 | 58.2 % | 41.8 % |
|
| 3.6 % | 87.1 % | 9.3 % |
|
| 2.3 % | 91.0 % | 6.4 % |
|
| 40.0 % | 5.4 % | 54.6 % |
|
| 22.5 % | 11.7 % | 65.8 % |
|
| 62.3 % | 15.9 % | 21.7 % |
|
| 33.3 % | 21.2 % | 45.4 % |
Fig. 2Distribution of clusters of discriminative motifs of similar physico-chemical nature among species
Fig. 3Proportion of hydrophobic (a), negative (b), positive (c) and proline (d) amino acids in the discriminative n-grams of each species
Fig. 4KEGG enrichment analysis of the phosphoproteins that match the discriminative n-grams. Functional conservation is found between the phosphoproteins that match the discriminative n-grams in closely related species belonging to the same kingdom/phylum. The histograms shows 1-p.value from the KEGG enrichment analysis
Fig. 5Logos of the top discriminative n-grams matching phosphopeptides having the maximum discriminative ratios in each species
Proportion of coils in the discriminative n-gram holding proteins and 21 and 13 mer serine centered phosphopeptides among the different species in the training and testing set
| Species | Proportion of coils in the proteins | Proportion of coils in the 21 mer phosphopeptides | Proportion of coils in the 13 mer phosphopeptides |
|---|---|---|---|
|
| 62 % | 85 % | 88 % |
|
| 67 % | 90 % | 92 % |
|
| 69 % | 91 % | 94 % |
|
| 63 % | 86 % | 89 % |
|
| 71 % | 89 % | 89 % |
|
| 64 % | 87 % | 92 % |
|
| 69 % | 89 % | 92 % |
|
| 63 % | 88 % | 91 % |
|
| 63 % | 90 % | 92 % |
|
| 64 % | 80 % | 86 % |