| Literature DB >> 29106588 |
Michael Bernhofer1, Tatyana Goldberg1, Silvana Wolf1, Mohamed Ahmed1, Julian Zaugg2, Mikael Boden2, Burkhard Rost1,3,4,5.
Abstract
NLSdb is a database collecting nuclear export signals (NES) and nuclear localization signals (NLS) along with experimentally annotated nuclear and non-nuclear proteins. NES and NLS are short sequence motifs related to protein transport out of and into the nucleus. The updated NLSdb now contains 2253 NLS and introduces 398 NES. The potential sets of novel NES and NLS have been generated by a simple 'in silico mutagenesis' protocol. We started with motifs annotated by experiments. In step 1, we increased specificity such that no known non-nuclear protein matched the refined motif. In step 2, we increased the sensitivity trying to match several different families with a motif. We then iterated over steps 1 and 2. The final set of 2253 NLS motifs matched 35% of 8421 experimentally verified nuclear proteins (up from 21% for the previous version) and none of 18 278 non-nuclear proteins. We updated the web interface providing multiple options to search protein sequences for NES and NLS motifs, and to evaluate your own signal sequences. NLSdb can be accessed via Rostlab services at: https://rostlab.org/services/nlsdb/.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29106588 PMCID: PMC5753228 DOI: 10.1093/nar/gkx1021
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Proteins with NES and NLS in nuclear protein dataset
|
|
|
|
|
|---|---|---|---|
|
| 2163 | 820 (37.9%) | 185 (8.6%) |
|
| 1263 | 282 (22.3%) | 52 (4.1%) |
|
| 1241 | 430 (34.6%) | 37 (3.0%) |
|
| 1011 | 420 (41.5%) | 89 (8.8%) |
|
| 1010 | 294 (29.1%) | 33 (3.3%) |
|
| 334 | 149 (44.6%) | 20 (6.0%) |
|
| 273 | 113 (41.4%) | 19 (7.0%) |
|
| 237 | 85 (35.9%) | 22 (9.3%) |
|
| 140 | 46 (32.9%) | 5 (3.6%) |
| Sum nine organisms | 7672 | 2639 (34.4%) | 462 (6.0%) |
: latin (common) names for the nine organisms that contributed the most nuclear proteins (sorted by number of nuclear proteins) to NLSdb (together 7672 proteins in these nine organisms accounted for 91% of all currently known 8421 nuclear proteins); : gives the number of proteins annotated experimentally as nuclear and retained in NLSdb after applying a variety of filters (Methods); : numbers and fractions (brackets) of the nuclear proteins that contain at least one NLS or NES from NLSdb.
Figure 1.Length distribution of NLS and NES sequences. The graphs compare the length distribution for the original NLS (A: gray line; total 2466 NLS) and the NLSdb set of NLS refined through in silico mutagenesis (A: dark line; total 1651 NLS), as well as the corresponding distributions for the original NES (B: gray line; total 788) and the NLSdb refined set of NES (B: dark line; total 192 NES). Note that motifs with over 25 amino acids were observed, but are not shown in the graphs due to sparseness (total: 156 original NLS, 42 original NES).
Nuclear signals and proteins in entire proteomes
|
|
|
|
|
|
|---|---|---|---|---|
|
| 21 042 | 2673 (12.7%) | 375 (1.8%) | 30% |
|
| 5142 | 501 (9.7%) | 78 (1.5%) | 34% |
|
| 27 502 | 2768 (10.1%) | 246 (0.9%) | 31% |
|
| 22 262 | 2684 (12.1%) | 358 (1.6%) | 30% |
|
| 6722 | 681 (10.1%) | 63 (0.9%) | 31% |
|
| 13 757 | 1573 (11.4%) | 168 (1.2%) | 31% |
|
| 20 057 | 1759 (8.8%) | 170 (0.8%) | 27% |
|
| 21 412 | 2555 (11.9%) | 359 (1.7%) | 28% |
|
| 44 321 | 4285 (9.7%) | 280 (0.6%) | 28% |
: latin (common) names for the nine organisms that contributed the most nuclear proteins to NLSdb (sorted by number of nuclear proteins); : gives the number of proteins found in the ‘entire proteome’ as we accessed it (Methods: Dataset of whole proteomes); : numbers and fractions (brackets) of the nuclear proteins that contain at least one NLS or NES from NLSdb; lists the fractions of proteins predicted by our generic machine learning-based method LocTree3 (22) as nuclear.
Figure 2.NES and NLS common to multiple organisms. The graph shows the cumulative percentage of NES and NLS found in at least one protein from the nine organisms used in Tables 1–2 (Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Mus musculus, Oryza sativa, Rattus norvegicus, Schizosaccharomyces pombe and Saccharomyces cerevisiae). Hundred percent corresponds to 353 NES and 2180 NLS contained in NLSdb. For instance, 46% of the NES and 46% of the NLS matched in at least four and six organisms, respectively (arrows).