| Literature DB >> 28000080 |
Abstract
Proteins harbor domains or short linear motifs, which facilitate their functions and interactions. Finding functional motifs in protein sequences could predict the putative cellular roles or characteristics of hypothetical proteins. In this study, we present Shetti-Motif, which is an interactive tool to (i) map UniProt and PROSITE flat files, (ii) search for multiple pre-defined consensus patterns or experimentally validated functional motifs in large datasets protein sequences (proteome-wide), (iii) search for motifs containing repeated residues (low-complexity regions, e.g., Leu-, SR-, PEST-rich motifs, etc.). As proof of principle, using this comparative proteomics pipeline, eleven proteomes encoded by member of Poxviridae family were searched against about 100 experimentally validated functional motifs. The closely related viruses and viruses infect the same host cells (e.g. vaccinia and variola viruses) show similar motif-containing proteins profile. The motifs encoded by these viruses are correlated, which explains why poxviruses are able to interact with wide range of host cells. In conclusion, this in silico analysis is useful to establish a dataset(s) or potential proteins for further investigation or compare between species.Entities:
Keywords: Comparative genomics; Functional genomics; Low-complexity regions (LCRs); Protein annotation; Protein domain; Protein function
Mesh:
Substances:
Year: 2016 PMID: 28000080 PMCID: PMC5357487 DOI: 10.1007/s11262-016-1416-9
Source DB: PubMed Journal: Virus Genes ISSN: 0920-8569 Impact factor: 2.332
Fig. 1Screenshot of Shetti-Motif main window (a), and flowchart of features and method used in this study (b)
The motif-containing proteins (McPs) profile of poxviruses, table S1–S3
| Vaccinia virus WR | Variola virus DNA | Monkeypox virus strain Zaire-96-I-16 | Yaba monkey tumor virus | Fowlpox virus | Canarypox virus | Orf virus | Cowpox virus | Camelpox virus | Myxoma virus strain Lausanne | Nile crocodilepox virus | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| GenBank ID | AY243312 | X69198 | AF380138 | AY386371 | AF198100 | AY318871 | AY386264 | AF482758 | AF438165 | AF170726 | DQ356948 |
| Number of proteins | 218 | 197 | 191 | 140 | 260 | 328 | 130 | 233 | 211 | 170 | 173 |
| Protein interaction, thiol-disulfide transfer [ | |||||||||||
| CxxxC | 25 | 23 | 22 | 16 | 27 | 41 | 15 | 32 | 28 | 16 | 36 |
| CxxC | 35 | 33 | 32 | 27 | 48 | 64 | 30 | 44 | 35 | 31 | 28 |
| Binding to integrins, RGD-related motifs (3–8% of whole proteome) [ | |||||||||||
| RGD | 9 | 6 | 10 | 5 | 8 | 10 | 11 | 7 | 10 | 6 | 14 |
| % | 4.1 | 3 | 5.2 | 3.6 | 3.1 | 3 | 8.5 | 3 | 4.7 | 3.5 | 8.1 |
| Binding to phospholipids, lipid raft-mediated endocytosis (3–27% of proteome) [ | |||||||||||
| RxLR | 12 | 8 | 10 | 6 | 12 | 19 | 36 | 14 | 11 | 5 | 38 |
| % | 5.5 | 4.1 | 5.2 | 4.3 | 4.6 | 5.8 | 27.7 | 6 | 5.2 | 2.9 | 22 |
| Glycosylation sites (58–81% of proteome) - ( | |||||||||||
| N{P}[ST]{P} | 165 | 153 | 154 | 112 | 209 | 264 | 78 | 181 | 167 | 128 | 101 |
| % | 75.7 | 77.7 | 80.6 | 80 | 80.4 | 80.5 | 60 | 77.7 | 79.1 | 75.3 | 58.4 |
| Nuclear localization sequence (NLS; KR-rich) motifs [ | |||||||||||
| KRxR | 11 | 10 | 10 | 6 | 8 | 18 | 9 | 17 | 13 | 10 | 19 |
| KRx [ | 0 | 0 | 0 | 0 | 1 | 4 | 1 | 0 | 0 | 0 | 1 |
| KRx [ | 1 | 1 | 2 | 2 | 1 | 5 | 0 | 2 | 0 | 0 | 2 |
| K[KR]RK | 3 | 3 | 2 | 2 | 6 | 8 | 0 | 3 | 3 | 2 | 5 |
| KR[KR]R | 1 | 1 | 1 | 1 | 0 | 3 | 2 | 1 | 2 | 1 | 7 |
| [PR]xxKR{DE}[KR] | 0 | 0 | 0 | 3 | 5 | 5 | 1 | 0 | 0 | 3 | 1 |
| [RP]xxKR[KR]{DE} | 1 | 2 | 0 | 2 | 4 | 2 | 3 | 2 | 1 | 2 | 2 |
| RKRP | 1 | 1 | 1 | 0 | 2 | 0 | 0 | 1 | 0 | 0 | 0 |
| Protein folding, Rossmann folds motifs, bind FAD or NAD(P) [ | |||||||||||
| Gx [ | 8 | 10 | 13 | 8 | 14 | 12 | 8 | 15 | 13 | 11 | 21 |
| Gxxx[GA] | 110 | 96 | 101 | 54 | 108 | 146 | 106 | 116 | 99 | 94 | 128 |
| SUMO binding (40–58 and 40–61% of proteome) [ | |||||||||||
| [VI]x[VI][VI] | 105 | 102 | 98 | 78 | 141 | 191 | 53 | 122 | 107 | 78 | 72 |
| % | 48.2 | 51.8 | 51.3 | 55.7 | 54.2 | 58.2 | 40.8 | 52.4 | 50.7 | 45.9 | 41.6 |
| hKx[DE] | 119 | 110 | 112 | 82 | 147 | 194 | 52 | 128 | 116 | 104 | 74 |
| % | 54.6 | 55.8 | 58.6 | 58.6 | 56.5 | 59.1 | 40 | 54.9 | 55 | 61.2 | 42.8 |
| Recruit ESCRT pathway [ | |||||||||||
| YxxL | 129 | 120 | 128 | 90 | 162 | 222 | 61 | 149 | 133 | 119 | 111 |
| % | 59.2 | 60.9 | 67 | 64.3 | 62.3 | 67.7 | 46.9 | 63.9 | 63 | 70 | 64.2 |
| hPxV | 42 | 41 | 42 | 30 | 53 | 79 | 44 | 47 | 41 | 51 | 72 |
| % | 19.3 | 20.8 | 22 | 21.4 | 20.4 | 24.1 | 33.8 | 20.2 | 19.4 | 30 | 41.6 |
| Walker A, A’ and B motifs [ | |||||||||||
| [AG]xxxxGK[ST] | 5 | 5 | 4 | 7 | 12 | 13 | 5 | 6 | 5 | 6 | 5 |
| hhhhDxDxR | 3 | 3 | 3 | 1 | 2 | 2 | 1 | 3 | 3 | 2 | 5 |
| hhhDxxP | 15 | 13 | 19 | 8 | 19 | 13 | 23 | 18 | 17 | 15 | 31 |
Total number of McPs (proteins harboring at least one instance of query motif; if >1 instances, they considered as (1) are counted for each query motifs; “%” means percentage of proteins (McPs) to total number of proteins; “x” denotes any residue; “{P}” denotes any residues, but P; alternative residues are bracketed; and [1, 2] means the motif is flanked by one or two residue(s); “h” denotes non-polar or hydrophobic residues. In this study, we considered h is equivalent “A, C, F, G, V, L, I, P, W, M, or Y” residue, Table S1
* Glycosylation sites were searched in entire protein sequences, but not confined to N- or C-terminals
Fig. 2Poxviruses encode divergent number of motifs; the motif-containing proteins (McPs) profile of closely related viruses are correlated. The number of motif-containing proteins (i.e., protein containing at least one instance of the query motif) were counted and normalized (percentage) to total number of proteins encoded by a virus, Table 1, S3. a Box and whisker plot shows 1st, 2nd, 3rd quantiles (Q1, Q2 and Q3 respectively) of numbers of McPs, whiskers at 1.5 IQR (interquartile range) (Q3 + 1.5 IQR); b the average and maximum numbers of McPs, the error bars are based on standard deviation values; c Spearman correlation coefficient values and scatterplots (using STASTICA Data Miner; StatSoft, USA) of the number of McPs encoded by each virus. VacV: Vaccinia virus WR, VarV: Variola virus, MPxV: Monkeypox virus, YMTV: Yaba monkey tumor virus, FPxV: Fowlpox virus, CPxV: Canarypox virus, Orf V: Orf virus, CPxV: Cowpox virus, CmPxV: Camelpox virus M-96, MV: Myxoma virus, and NileCV: Nile crocodilepox virus