| Literature DB >> 27050421 |
Surbhi Sharma1, Oniel Toledo1, Michael Hedden1, Kenneth F Lyon1, Steven B Brooks1, Roxanne P David1, Justin Limtong1, Jacklyn M Newsome1, Nemanja Novakovic1, Sanguthevar Rajasekaran2, Vishal Thapar3, Sean R Williams1, Martin R Schiller1.
Abstract
All translated proteins end with a carboxylic acid commonly called the C-terminus. Many short functional sequences (minimotifs) are located on or immediately proximal to the C-terminus. However, information about the function of protein C-termini has not been consolidated into a single source. Here, we built a new "C-terminome" database and web system focused on human proteins. Approximately 3,600 C-termini in the human proteome have a minimotif with an established molecular function. To help evaluate the function of the remaining C-termini in the human proteome, we inferred minimotifs identified by experimentation in rodent cells, predicted minimotifs based upon consensus sequence matches, and predicted novel highly repetitive sequences in C-termini. Predictions can be ranked by enrichment scores or Gene Evolutionary Rate Profiling (GERP) scores, a measurement of evolutionary constraint. By searching for new anchored sequences on the last 10 amino acids of proteins in the human proteome with lengths between 3-10 residues and up to 5 degenerate positions in the consensus sequences, we have identified new consensus sequences that predict instances in the majority of human genes. All of this information is consolidated into a database that can be accessed through a C-terminome web system with search and browse functions for minimotifs and human proteins. A known consensus sequence-based predicted function is assigned to nearly half the proteins in the human proteome. Weblink: http://cterminome.bio-toolkit.com.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27050421 PMCID: PMC4822787 DOI: 10.1371/journal.pone.0152731
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 5Fold-enrichment scores of minimotifs and predicted sequences.
Bar graph showing the percentage of occurrences with different proteome-wide (A), and discrete-proteome (B) fold-enrichments. Dark gray bars represent the percentage of C-terminal minimotifs, and light gray bars represent the percentage of predicted consensus sequences and instances from de novo generated sequences.
Summary statistics of the C-terminome database.
| C-terminome statistics | Number |
|---|---|
| Protein C-termini (RefSeq) | 16,059 |
| Protein C-termini, alternative splice variants (RefSeq) | 19,522 |
| Total C-termini | 35,581 |
| Experimentally verified motif instances | 3,593 |
| Predictions—inferred from rodents | 867 |
| Predictions—by consensi | 27,546 |
| Predictions— | 9,283,432 |
| Total predicted sequences | 9,311,845 |
| Binding | 650 |
| Modification | 2,937 |
| Trafficking | 44 |
| Total functions | 3,631 |
Verified functional C-termini consensus sequences.
| Consensus Sequence | Molecular Function | Description | PubMed Identifier | # Predicted Minimotif Instances | # Total Instances |
|---|---|---|---|---|---|
| Bind | PDZ domain class III binding | 11741967 | 721 | 723 | |
| x[AVILMFYW]x[AVILMFYW]>2 | Bind | PDZ domain class II binding | 11741967 | 696 | 702 |
| [KRHQSA][DENQ]EL> | Bind, Traffic | KDEL receptor binding motif | 3545499 | 80 | 81 |
| x[S/T]x[AVILMFYW]>2 | Bind, Traffic | Peroxisomal targeting | 1567655 | 108 | 111 |
| [ST]x[LV]> | Bind | PDZ domain class I binding | 11741967 | 1,432 | 1,441 |
| [STAGCN][KRH][LIVMAFY]> | Bind, Traffic | Peroxisomal targeting | 2901422 | 805 | 839 |
| [WFY]RP[WFY]x(0,6)> | Bind, Traffic | Endoplasmic reticulum (ER) export | 8649374, 12972562 | 114 | 116 |
| C[AVLIFYWM][AVLIFYWM][ACDEFGHIKNPQRSTVWY]> | PTM | Farnesylation | 8702508 2187294 | 104 | 109 |
| C[AVLIFYWM][AVLIFYWM][LM]> | PTM | Geranyl-geranylation | 8702508 2187294 | 70 | 71 |
| Cxxx> | PTM | Farnesylation | 1903399 | 178 | 180 |
| C[GAVLI][GAVLI]x> | PTM | Prenylation | 8702508 | 304 | 306 |
| CxxM> | PTM | Mevalonation | 2686979 | 47 | 48 |
| DEWDx> | Bind | Aldolase binding | 16278221 | 0 | 1 |
| DxE> | Bind, Traffic | COPII binding | 11726510 | 131 | 132 |
| FFxxKKxx> | Bind, Traffic | Arf1 binding motif | 15125774 | 2 | 3 |
| FxxxFxxxF> | Bind, Traffic | ER export | 11331877 | 2 | 3 |
| Kx(0,1)Kx(1,3)> | Bind, Traffic | ER retention | 2120038 | 1,544 | 1,548 |
| S[ST]L> | Bind | PDZ domain class I binding | 11741967 | 84 | 85 |
| SxS> | Bind | Phosphorylation of Smad | 9346966 | 404 | 405 |
| VxPx> | Bind, Traffic | Rod outer segment trafficking | 15728366 | 101 | 102 |
| (V/L)xxSL> | Bind, Traffic | Cell surface expression of Kv1 family K+ channels | 11343973 | 10 | 11 |
| Yxx[AVILMFYW]> | Bind, Traffic | Lysosomal targeting, Dendritic targeting | 9175836, 15689548 | 94 | 98 |
| VMI> | Traffic | ERGIC compartment export | 14517323 | 0 | 1 |
| LxxLxPDExD> | Traffic | Glut4 targeting | 24939910 | 0 | 1 |
| FF> | Bind, Traffic | Endoplasmic Reticulum Export | 9395526 | 78 | 79 |
| HDEL> | Bind, Traffic | Internalization | 2178921 | 12 | 14 |
| KDEL> | Bind, Traffic | Nuclear export, To cell surface & dendrites | 3545499 | 11 | 14 |
| KKx> | Bind, Traffic | To Endoplasmic Reticulum Import | 2120038 | 295 | 296 |
| 7,427 | 7,520 | ||||
1”x” indicates any of the twenty amino acids and “> “designates the C-terminal end of a protein [1,46]
2Although a more specific consensus specificity profile for the PDZ domain recognition exists, a more simplified classification was used [28,43,47,48].
3Predicted minimotif instances are matches to consensus sequences that have not yet been experimentally tested.
4Total instances include both predicted and experimentally verified minimotif instances.