| Literature DB >> 25332401 |
Nikolaus Fortelny1, Sharon Yang2, Paul Pavlidis3, Philipp F Lange4, Christopher M Overall5.
Abstract
The knowledgebase TopFIND is an analysis platform focussed on protein termini, their origin, modification and hence their role on protein structure and function. Here, we present a major update to TopFIND, version 3, which includes a 70% increase in the underlying data to now cover a 90,696 proteins, 165,044 N-termini, 130,182 C-termini, 14,382 cleavage sites and 33,209 substrate cleavages in H. sapiens, M. musculus, A. thaliana, S. cerevisiae and E. coli. New features include the mapping of protein termini and cleavage entries across protein isoforms and significantly, the mapping of protein termini originating from alternative transcription and alternative translation start sites. Furthermore, two analysis tools for complex data analysis based on the TopFIND resource are now available online: TopFINDer, the TopFIND ExploRer, characterizes and annotates proteomics-derived N- or C-termini sets for their origin, sequence context and implications for protein structure and function. Neo-termini are also linked to associated proteases. PathFINDer identifies indirect connections between a protease and list of substrates or termini thus supporting the evaluation of complex proteolytic processes in vivo. To demonstrate the utility of the tools, a recent N-terminomics data set of inflamed murine skin has been re-analyzed. In re-capitulating the major findings originally performed manually, this validates the utility of these new resources. The point of entry for the resource is http://clipserve.clip.ubc.ca/topfind from where the graphical interface, all application programming interfaces (API) and the analysis tools are freely accessible.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25332401 PMCID: PMC4383881 DOI: 10.1093/nar/gku1012
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Biological processes leading to differences in termini in proteins and databases containing corresponding information.
Counts of non-canonical termini evidenced by alternative splicing evidence (from Ensembl) or by alternative translation (from TISdb)
| Non-canonical termini | ||||
|---|---|---|---|---|
| Terminus type | Biological process | Human | Mouse | SUM |
| N-termini | Alternative splicing | 3141 | 1390 | 4531 |
| Alternative translation | 439 | 1437 | 1876 | |
| C-termini | Alternative splicing | 8229 | 3849 | 12 078 |
| Alternative translation | 0 | 0 | 0 | |
| Total | 11 809 | 6676 | 18 485 | |
Figure 2.Input and output of TopFINDer. (A) Input mask within the TopFIND web interface. (B) Venn diagram showing the overlap of termini evidences retrieved from TopFIND for a list of proteins. Evidence is either UniProt annotated terminus (Curated Start), terminus of an isoform derived from alternative splicing (Alternatively Spliced), or from alternative translation (Alternatively Translated), from cleavage (Cleaved), or a terminus observed in a non-protease related terminomics experiment (Experimentally Observed). (C) Matrix of substrates and proteases indicating cleavage of substrates. Fields are red where there is cleavage and black where there is none known between the protease and the substrate at this position. The y-axis shows the protein identifier and the position of each terminus. (D) Barplot showing the number of cleavages of each protease in the list. Bars of proteases whose cleavages are enriched in the list are in red, others in blue. (E) IceLogo of the sequences in the list.
Protease enrichment results from TopFINDer in the skin data set
| Protease name | Protease accession | List count (total = 129) | DB count (total = 1265) | Fold enrichment | Fold coverage | Fisher exact test ( | Adjusted Fisher exact test ( |
|---|---|---|---|---|---|---|---|
| MMP2 | P33434 | 68 | 537 | 1.24 | 0.13 | 1.62E-02 | 5.27E-02 |
| CATE | P70269 | 51 | 544 | 0.92 | 0.09 | 8.03E-01 | 8.03E-01 |
| THRB | P19221 | 1 | 1 | 9.81 | 1.00 | 1.77E-01 | 2.87E-01 |
| GRAB | P04187 | 16 | 58 | 2.71 | 0.28 | 7.28E-04 | 9.47E-03 |
| CASP3 | P70677 | 7 | 28 | 2.45 | 0.25 | 3.67E-02 | 7.96E-02 |
| CASP7 | P97864 | 7 | 28 | 2.45 | 0.25 | 3.67E-02 | 7.96E-02 |
| CATD | P18242 | 29 | 309 | 0.92 | 0.09 | 7.22E-01 | 7.82E-01 |
| MPPB | Q9CXT8 | 4 | 4 | 9.81 | 1.00 | 3.65E-03 | 2.37E-02 |
| CAN2 | O08529 | 2 | 8 | 2.45 | 0.25 | 2.35E-01 | 3.29E-01 |
| MMP9 | P41245 | 5 | 11 | 4.46 | 0.45 | 1.19E-02 | 5.15E-02 |
| CASP1 | P29452 | 2 | 16 | 1.23 | 0.13 | 5.07E-01 | 6.00E-01 |
| MMP8 | O70138 | 1 | 2 | 4.90 | 0.50 | 2.53E-01 | 3.29E-01 |
| MMP13 | P33435 | 1 | 1 | 9.81 | 1.00 | 1.77E-01 | 2.87E-01 |
Figure 3.Fragment of the graphviz figure of protease web connections identified by PathFINDer. Nodes are proteins, the query protease is marked in color and the proteins from the submitted list are gray. Edges are cleavages (arrows, with numbers for the position of the cleavage) or inhibitions (T shaped arrows, labeled as ‘inh’). Edges from TopFIND are solid and edges inferred from the list are dotted. Nodes from the complement system are marked with red.