| Literature DB >> 21571812 |
Viswanadham Sridhara1, Aron Marchler-Bauer, Stephen H Bryant, Lewis Y Geer.
Abstract
New generation sequencing technologies have resulted in significant increases in the number of complete genomes. Functional characterization of these genomes, such as by high-throughput proteomics, is an important but challenging task due to the difficulty of scaling up existing experimental techniques. By use of comparative genomics techniques, experimental results can be transferred from one genome to another, while at the same time minimizing errors by requiring discovery in multiple genomes. In this study, protein phosphorylation, an essential component of many cellular processes, is studied using data from large-scale proteomics analyses of the phosphoproteome. Phosphorylation sites from Homo sapiens, Mus musculus and Drosophila melanogaster phosphopeptide data sets were mapped onto conserved domains in NCBI's manually curated portion of Conserved Domain Database (CDD). In this subset, 25 phosphorylation sites are found to be evolutionarily conserved between the three species studied. Transfer of phosphorylation annotation of these conserved sites onto sequences sharing the same conserved domains yield 3253 phosphosite annotations for proteins from coelomata, the taxonomic division that spans H. sapiens, M. musculus and D. melanogaster. The method scales automatically, so as the amount of experimental phosphoproteomics data increases, more conserved phosphorylation sites may be revealed.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21571812 PMCID: PMC3096321 DOI: 10.1093/database/bar019
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.NCBI Conserved Domains annotated on a protein query. (a) RPS-BLAST is used to find domain footprints and derived functional sites using a serine–threonine protein kinase as a protein query (GI 110349738). Shown in red is a specific hit, i.e. a protein sub-family identified with high confidence. (b) Example of a multiple sequence alignment (MSA) representing a CDD domain. The alignable regions (structured blocks or block alignments) are shown in upper case blocks, while the unaligned regions are shown as lower case and gaps. NCBI CDD also can provide functional site annotation. The hash marks indicate the annotation of an activation loop (A-loop). The row starting with ‘query’ shows the protein query (GI:110349738) with start and stop sites.
Figure 2.Evolutionarily conserved phosphosites. Each of the experimental phosphopeptide data sets were mapped onto conserved domain-specific hits and the site positions on the domain models were examined for overlap. The Venn diagram shows the number of sites that overlap between each species and among all three species. Twenty five highly conserved phosphorylation sites are shared by all species.
List of protein families with conserved phosphosites
| PSSM-ID | Sites | Protein family (NCBI CDD) and description |
|---|---|---|
| 28 957 | 32 | H4:Histone H4. |
| 30 346 | 33 | AMPKbeta_GBD_like:AMP-activated protein kinase beta subunit glycogen binding domain. |
| 48 161 | 43 | GroEL:GroEL_like type I chaperonin. |
| 48 163 | 234 | TPP_E1_PDC_ADC_BCADC:Thiamine pyrophosphate family. |
| 100 088 | 44 | PGM3: phosphoglucomutase 3. |
| 100 101 | 26 | Ribosomal_L11:Ribosomal protein L11. |
| 107 222 | 107 | p23_hB-ind1_like:p23_like domain found in human (h) butyrate-induced transcript 1 (B-ind1) and similar proteins. |
| 132 804 | 45 | PX_SNX3_like:The phosphoinositide binding Phox Homology domain of Sorting Nexin 3 and related proteins. |
| 132 940 | 157 | STKc_MST3_like:Catalytic domain of Mammalian Ste20-like protein kinase 3-like Protein Serine/Threonine Kinases. |
| 132 979 | 174 | STKc_PAK_II:Catalytic domain of the Protein Serine/Threonine Kinase, Group II p21-activated kinase. |
| 143 346 | 154 | STKc_CDK7:Catalytic domain of the Serine/Threonine Kinase, Cyclin-Dependent protein Kinase 7. |
| 143 354 | 167 | STKc_ERK1_2_like:Catalytic domain of Extracellular signal-Regulated Kinase 1 and 2-like Serine/Threonine Kinases. |
| 143 356 | 173 | STKc_p38:Catalytic domain of the Serine/Threonine Kinase, p38 Mitogen-Activated Protein Kinase. |
| 173 660 | 152 | STKc_AGC:Catalytic domain of AGC family Protein Serine/Threonine Kinases. |
| 173 673 | 295 | STKc_RSK_N:N-terminal catalytic domain of the Protein Serine/Threonine Kinase, 90 kDa ribosomal protein S6 kinase. |
| 173 680 | 302 | STKc_PKN:Catalytic domain of the Protein Serine/Threonine Kinase, Protein Kinase N. |
| 173 752 | 12 | STKc_CDK1_euk:Catalytic domain of the Serine/Threonine Kinase, Cyclin-Dependent protein Kinase 1. |
| 176 301 | 50 | PH_Cool_PixCool Pix pleckstrin homology (PH) domain. |
aCDD phosphorylation annotation.
bLiterature (LTP)
cNo evidence.
Figure 3.Algorithm flowchart. This flowchart explains in brief how a conserved site is obtained from the experimental phosphopeptide data sets from three species. First, a phosphopeptide is mapped to its protein sequence and later onto specific hits, if any. If the phosphosites from these three species map to the same position on the specific hit, we consider the site to be conserved.