Literature DB >> 23093589

ValidNESs: a database of validated leucine-rich nuclear export signals.

Szu-Chin Fu1, Hsuan-Cheng Huang, Paul Horton, Hsueh-Fen Juan.   

Abstract

ValidNESs (http://validness.ym.edu.tw/) is a new database for experimentally validated leucine-rich nuclear export signal (NES)-containing proteins. The therapeutic potential of the chromosomal region maintenance 1 (CRM1)-mediated nuclear export pathway and disease relevance of its cargo proteins has gained recognition in recent years. Unfortunately, only about one-third of known CRM1 cargo proteins are accessible in a single database since the last compilation in 2003. CRM1 cargo proteins are often recognized by a classical NES (leucine-rich NES), but this signal is notoriously difficult to predict from sequence alone. Fortunately, a recently developed prediction method, NESsential, is able to identify good candidates in some cases, enabling valuable hints to be gained by in silico prediction, but until now it has not been available through a web interface. We present ValidNESs, an integrated, up-to-date database holding 221 NES-containing proteins, combined with a web interface to prediction by NESsential.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 23093589      PMCID: PMC3531083          DOI: 10.1093/nar/gks936

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

For many cellular and viral proteins, active transport is required for the journey from nucleus to cytoplasm through the nuclear pore complexes. This transport is mostly mediated by the karyopherin exportin 1/chromosomal region maintenance 1 (CRM1) recognizing the classical nuclear export signals (NESs) of cargo molecules. The classical NES is characterized by three to four conserved hydrophobic residues, usually leucine, and the spacing between them. Several consensus sequences have been proposed to describe the classical NES (1,2); however, as we previously demonstrated, they all suffer from poor predictive power in identifying potential NES-containing proteins (3). It should be noted that an increasing number of non-classical CRM1-mediated NESs, albeit still a minority, have been validated in recent years. Many recent studies focus on the therapeutic potential of the CRM1-mediated nuclear export pathway. This nuclear export pathway is suggested to be involved in the mechanism inducing the abnormal localization of many tumor suppressors, p53 for instance, in various cancer cells (4). Furthermore, CRM1 has been found to be overexpressed in cervical cancer and critical for cancer cell proliferation and survival (5). As for the cargo proteins, many cellular NES-containing proteins are involved in important processes such as signal transduction, cell-cycle regulation and tumor suppression. Moreover, many known cargo proteins are viral, often playing a role in viral genome trafficking: the HIV-1 Rev protein is related to the export of unspliced or partially spliced viral messenger RNA (mRNA) (6); NS2/NEP of influenza A virus plays a critical role in the export of newly synthesized viral ribonucleoproteins, a complex composed of individual negative-sense viral RNAs and various viral proteins (7); while in adenovirus type 5, several NES-containing proteins were found to be required for efficient export of adenoviral early mRNA (8). Due to their potential disease relevance, experimental identification of NES-containing proteins has been an active field of research. Surprisingly, this issue has been neglected by the computational biology community in recent years. NESbase (9), listing 75 validated NES-containing proteins has been a valuable resource for experimental and computational biologists, with >100 citations since its publication. Unfortunately, NESbase ceased updating after 2003 and now contains only about one-third of all validated NES-containing proteins. We therefore developed ValidNESs, in which we organize information on 221 NES-containing proteins compiled from the literature. Moreover, ValidNESs is easier to use and search against, is better cross-linked to external databases and provides a state-of-the-art prediction method in one site.

DATABASE CONTENT

The first version of ValidNESs, made publicly available in June 2012, includes 262 functional NES sites from 221 NES-containing proteins (36 of them are multiple NES-containing proteins). In this version, we updated the collection of NES-containing proteins by compiling another 76 NES-containing proteins (up to 2012) and integrated them with those listed in NESbase (9) and the Supplementary Data of our previous NESsential paper (3), 75 and 70 proteins, respectively. Figure 1 shows a pie chart illustrating the number of proteins by species. In addition to sequence information, we collected a total of 52 local structures containing the entire NES region from the Protein Data Bank (PDB), which is exclusively available in ValidNESs. These local structures mainly (65%) consist of α-helix and other extended formations such as bends or loops. This result is basically consistent with the previous conclusion made from eight structures of NES-containing proteins (10). However, we found that β-structure can be found in 14 NES regions. Interestingly, Nilsen et al. (11) reported the first NES located on a β-strand in fibroblast growth factor-1 in 2007 and suggested that NESs with similar local structure should be found afterward. The updated data in ValidNESs support their speculation.
Figure 1.

Pie chart of species. Distribution of entries in ValidNESs. The number of species in which NES-containing proteins were validated are indicated in parenthesis.

Pie chart of species. Distribution of entries in ValidNESs. The number of species in which NES-containing proteins were validated are indicated in parenthesis. To organize the data, we designed two different tables: one for NES-containing regions and another for NES-containing proteins. For users interested in functional NESs, sequence and secondary structural information (when applicable) can be found in the table of NES-containing regions. There is another table of NES-containing proteins designed for users requiring more information at the protein level, such as subcellular localization and protein–protein interaction. Detailed field descriptions for each table are given in Supplementary Tables S1 and S2, respectively.

THE CLASSICAL NES

Some previous work has defined a consensus sequence for NESs as [LIVFM]-x-(2,3)-[LIVFM]-x(2,3)-[LIVFM]-x-[LIVFM], where x is any amino acid (12). However, we found that 43% of NESs in ValidNESs deviate from this consensus sequence. We therefore defined a short consensus pattern [LIVFM]-x(2,3)-[LIVFM]-x-[LIVFM], hereafter denoted as the ‘NES motif’, containing the region bounded by the second and fourth hydrophobic positions of the former consensus (3), a region which has been shown to affect NES activity strongly (13,14). In ValidNESs, we use this generalized consensus pattern to divide experimentally determined NES sites into two categories: classical if the experimentally validated region contains or overlaps with a consensus match, otherwise non-classical. This definition of classical NES is justified by a dramatic improvement in sensitivity (from 57 to 86%). We tested the enrichment of this NES motif by binomial test, attaining P-values of 7.4e−64 (6-mer matches) and 1.5e−34 (7-mer matches), respectively. Finally, we generated sequence logos for the classical NESs aligned by consensus match (Figure 2).
Figure 2.

Sequence logos for NES sites. Sequence logos generated by the WebLogo server for NES motif matches after removing redundant sequences (with sequence identity >25%) and aligning the three hydrophobic positions within the motif. In general, the preference for negatively charged residues is lower than previously observed in NESbase. (A) Sequence logo for 6-mer NES motif matches with upstream and downstream 10-mer flanks (227 sites). (B) Sequence logo for 7-mer NES motif matches with upstream and downstream 10-mer flanks (162 sites).

Sequence logos for NES sites. Sequence logos generated by the WebLogo server for NES motif matches after removing redundant sequences (with sequence identity >25%) and aligning the three hydrophobic positions within the motif. In general, the preference for negatively charged residues is lower than previously observed in NESbase. (A) Sequence logo for 6-mer NES motif matches with upstream and downstream 10-mer flanks (227 sites). (B) Sequence logo for 7-mer NES motif matches with upstream and downstream 10-mer flanks (162 sites).

DATA ACCESS

In addition to being up-to-date, ValidNESs provides an easy-to-use search interface. Table 1 summarizes the major difference between NESbase and ValidNESs. ValidNESs provides three search functions to retrieve particular data (or display all by default). Once the user submits the query, ValidNESs generates a complete table in text format ready for download and displays an online simplified table providing links to external databases. An overview of the search and search result interfaces is shown in Figure 3.
Table 1.

Comparison between NESbase and ValidNESs

NESbaseValidNESs
Number of NES-containing proteins75221a
Website architectureHTML flat fileMySQL + PHP + Apache
Data accessNo special search functionalitySearchable
User submissionTemporarily disabledSupported

aSeventy-five NES-containing proteins are imported from NESbase.

Figure 3.

An overview of the search and search result interfaces in ValidNESs. ValidNESs stores metadata in two tables and provides three search functions to access these data. Once users submit their queries, the search result in text file format and FASTA format (for table of NES-containing proteins only) is generated for download. Meanwhile, ValidNESs also displays an online table for quick browsing.

An overview of the search and search result interfaces in ValidNESs. ValidNESs stores metadata in two tables and provides three search functions to access these data. Once users submit their queries, the search result in text file format and FASTA format (for table of NES-containing proteins only) is generated for download. Meanwhile, ValidNESs also displays an online table for quick browsing. Comparison between NESbase and ValidNESs aSeventy-five NES-containing proteins are imported from NESbase. ValidNESs provides a ‘search-by-pattern’ function with regular expression support to facilitate retrieving particular NESs of interest. For example, Henderson and Eleftheriou (15) designed a Rev(1.4)-based shuttling assay and assessed the relative export efficiency of different types of NESs. This search function allows users to search and retrieve NES sites resembling those with available information on relative export efficiency. In ValidNESs, NES sites are divided into two categories based on the NES motif as previously mentioned. Therefore, users can use the ‘search-by-category’ function to retrieve the classical NES sites in an extended definition: that is, sites with an NES motif match lying inside or across the boundary of the experimentally determined NES-containing region. For NES-containing proteins, ValidNESs provides a ‘search-by-keyword’ function based on their UniProtKB keywords such as apoptosis or tumor suppressor. In addition to the complete table in text format, protein sequences including NES locations are also downloadable in FASTA format. Step-by-step instructions for novice users are available on the homepage of ValidNESs.

DATA CURATION

In most cases, the CRM1 dependence of NESs in ValidNESs is validated by treatment with leptomycin (LMB), a potent inhibitor blocking the binding of CRM1 to NESs (16). However, 42 (16%) of the NESs in ValidNESs have not had their CRM1 dependence validated with LMB. For these NESs, some other experimental techniques, such as yeast two-hybrid system and in vitro binding experiments, were used to demonstrate the interaction between CRM1- and NES-containing proteins (17,18). However, many of these NESs, 27 from NESbase for instance, were discovered around the early 2000s. In contrast, only 11 of these NESs were discovered in the last 5 years, as LMB has become widely used. For clarification, we add the LMB information in both the online and downloadable table of NES sites. We also cross-link to PDB in the same table if any structure containing the entire NES region is available. When multiple structures are available, we select the structure with the highest resolution and include the corresponding PDB ID in the table. As mentioned above, 75 NES-containing proteins in ValidNESs were directly imported from NESbase. We updated the content in NESbase before integrating it into ValidNESs. This update includes one subsequently discovered NES for BRCA1 (19) and seven updated accession numbers in UniProtKB. In addition, we found nine protein sequences listed in NESbase differing from the current reference sequences in UniProtKB (eight with insertions and one with a point mutation). For these proteins, ValidNESs provides the sequences from UniProtKB and the modified NES positions according to the updated sequences. At the protein level, we provide information on subcellular localization and protein–protein interaction based on the relevant cross-references in UniProtKB. We extracted the GO cellular component annotation for the subcellular localization and imported the protein–protein interactions from four external databases: DIP (20), IntAct (21), MINT (22) and STRING (23). We also provide cross-references to NLSdb, a database of nuclear localization signals (NLSs) and nuclear proteins targeted to the nucleus by NLS motifs (24).

PREDICTION OF NES

ValidNESs provides online prediction of NES based on NESsential, our recently developed NES prediction method (3). Supplementary Figure S1 shows the submission interface where users can input a single protein sequence or a UniProt protein name (UniProt ID) such as IPKA_HUMAN. After successful submission and processing, users can view the prediction results, at both protein and site level, and an easy explanation about how to interpret them. ValidNESs currently allows one single sequence in a submission. For users having large computational needs such as large-scale screening, the standalone version of NESsential is recommended (http://seq.cbrc.jp/NESsential/).

DATA SUBMISSION

We greatly appreciate the efforts of researchers to discover and validate new CRM1-mediated NESs and encourage them to submit their new data to ValidNESs in the future. From the homepage of ValidNESs, we provide a preformatted form, including an example, for submission by email. We intend to maintain and frequently update ValidNESs for many years.

DISCUSSION

The large dataset consolidated in ValidNESs facilitates the investigation of various questions related to NES sequence and function. One interesting question is: why do some proteins have more than one NES? In 2007, Engelsma et al. (25) found a monomer-specific NES of human survivin, a key regulator of cell division containing two functional NESs, indicating that NESs in the same protein may play different functional roles. We therefore assume that distinct NESs in the same protein may be under different selective pressure to be conserved, e.g. some of them could be species specific. To test our assumption, we made an investigation among 28 multiple NES-containing proteins whose homologs are available in HomoloGene (http://www.ncbi.nlm.nih.gov/homologene). We defined an abrogation of an NES as a mutation which causes the NES to no longer match the NES motif covering the three essential hydrophobic residues. As a result, we found 13 out of 28 homologous groups containing at least one NES abrogation (see Supplementary Data), demonstrating that the presence of multiple functional NESs is not necessarily conserved in evolution.

CONCLUSION

We present ValidNESs, an integrated, up-to-date database and web interface to the NES prediction method NESsential. To illustrate the kind of analysis facilitated by the data organized in ValidNESs, we summarized the secondary structure propensity of NESs and discussed the existence of species-specific NESs. In conclusion, ValidNESs provides both updated data and an upgraded interface for convenient access to experimentally validated NESs- and NES-containing proteins.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online: Supplementary Tables 1 and 2, Supplementary Figure 1 and Supplementary Case Study.

FUNDING

National Science Council, Taiwan [NSC 99-2621-B-002-005-MY3 and 99-2621-B-010-001-MY3]; National Taiwan University Cutting-Edge Steering Research Project [10R70602C3 and 101R7602C3]; Top University Project [10R40044 and 101R4000]. Funding for open access charge: National Science Council, Taiwan [NSC 99-2621-B-002-005-MY3 and 99-2621-B-010-001-MY3]; National Taiwan University Cutting-Edge Steering Research Project [10R70602C3 and 101R7602C3]. Conflict of interest statement. None declared.
  25 in total

1.  NESbase version 1.0: a database of nuclear export signals.

Authors:  Tanja la Cour; Ramneek Gupta; Kristoffer Rapacki; Karen Skriver; Flemming M Poulsen; Søren Brunak
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

2.  Homodimerization antagonizes nuclear export of survivin.

Authors:  Dieuwke Engelsma; Jose A Rodriguez; Alexander Fish; Giuseppe Giaccone; Maarten Fornerod
Journal:  Traffic       Date:  2007-08-20       Impact factor: 6.215

3.  Nuclear export signal consensus sequences defined using a localization-based yeast selection system.

Authors:  Shunichi Kosugi; Masako Hasebe; Masaru Tomita; Hiroshi Yanagawa
Journal:  Traffic       Date:  2008-09-25       Impact factor: 6.215

Review 4.  CRM1-mediated nuclear export of proteins and drug resistance in cancer.

Authors:  Joel G Turner; Daniel M Sullivan
Journal:  Curr Med Chem       Date:  2008       Impact factor: 4.530

5.  A nuclear export sequence located on a beta-strand in fibroblast growth factor-1.

Authors:  Trine Nilsen; Ken R Rosendal; Vigdis Sørensen; Jørgen Wesche; Sjur Olsnes; Antoni Wiedłocha
Journal:  J Biol Chem       Date:  2007-07-06       Impact factor: 5.157

6.  The Karyopherin proteins, Crm1 and Karyopherin beta1, are overexpressed in cervical cancer and are critical for cancer cell survival and proliferation.

Authors:  Pauline J van der Watt; Christopher P Maske; Denver T Hendricks; M Iqbal Parker; Lynette Denny; Dhirendra Govender; Michael J Birrer; Virna D Leaner
Journal:  Int J Cancer       Date:  2009-04-15       Impact factor: 7.396

7.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored.

Authors:  Damian Szklarczyk; Andrea Franceschini; Michael Kuhn; Milan Simonovic; Alexander Roth; Pablo Minguez; Tobias Doerks; Manuel Stark; Jean Muller; Peer Bork; Lars J Jensen; Christian von Mering
Journal:  Nucleic Acids Res       Date:  2010-11-02       Impact factor: 16.971

8.  Prediction of leucine-rich nuclear export signal containing proteins with NESsential.

Authors:  Szu-Chin Fu; Kenichiro Imai; Paul Horton
Journal:  Nucleic Acids Res       Date:  2011-06-24       Impact factor: 16.971

9.  The IntAct molecular interaction database in 2012.

Authors:  Samuel Kerrien; Bruno Aranda; Lionel Breuza; Alan Bridge; Fiona Broackes-Carter; Carol Chen; Margaret Duesbury; Marine Dumousseau; Marc Feuermann; Ursula Hinz; Christine Jandrasits; Rafael C Jimenez; Jyoti Khadake; Usha Mahadevan; Patrick Masson; Ivo Pedruzzi; Eric Pfeiffenberger; Pablo Porras; Arathi Raghunath; Bernd Roechert; Sandra Orchard; Henning Hermjakob
Journal:  Nucleic Acids Res       Date:  2011-11-24       Impact factor: 16.971

10.  MINT, the molecular interaction database: 2012 update.

Authors:  Luana Licata; Leonardo Briganti; Daniele Peluso; Livia Perfetto; Marta Iannuccelli; Eugenia Galeota; Francesca Sacco; Anita Palma; Aurelio Pio Nardozza; Elena Santonico; Luisa Castagnoli; Gianni Cesareni
Journal:  Nucleic Acids Res       Date:  2011-11-16       Impact factor: 16.971

View more
  35 in total

1.  LocNES: a computational tool for locating classical NESs in CRM1 cargo proteins.

Authors:  Darui Xu; Kara Marquis; Jimin Pei; Szu-Chin Fu; Tolga Cağatay; Nick V Grishin; Yuh Min Chook
Journal:  Bioinformatics       Date:  2014-12-15       Impact factor: 6.937

Review 2.  Atomic basis of CRM1-cargo recognition, release and inhibition.

Authors:  Ho Yee Joyce Fung; Yuh Min Chook
Journal:  Semin Cancer Biol       Date:  2014-03-12       Impact factor: 15.707

3.  Nuclear Export Signal Masking Regulates HIV-1 Rev Trafficking and Viral RNA Nuclear Export.

Authors:  Ryan T Behrens; Mounavya Aligeti; Ginger M Pocock; Christina A Higgins; Nathan M Sherer
Journal:  J Virol       Date:  2017-01-18       Impact factor: 5.103

4.  XPO1 (CRM1) inhibition represses STAT3 activation to drive a survivin-dependent oncogenic switch in triple-negative breast cancer.

Authors:  Yan Cheng; Michael P Holloway; Kevin Nguyen; Dilara McCauley; Yosef Landesman; Michael G Kauffman; Sharon Shacham; Rachel A Altura
Journal:  Mol Cancer Ther       Date:  2014-01-15       Impact factor: 6.261

5.  Phase IB Study of Selinexor, a First-in-Class Inhibitor of Nuclear Export, in Patients With Advanced Refractory Bone or Soft Tissue Sarcoma.

Authors:  Mrinal M Gounder; Alona Zer; William D Tap; Samer Salah; Mark A Dickson; Abha A Gupta; Mary Louise Keohan; Herbert H Loong; Sandra P D'Angelo; Stephanie Baker; Mercedes Condy; Kjirsten Nyquist-Schultz; Lanier Tanner; Joseph P Erinjeri; Francis H Jasmine; Sharon Friedlander; Robert Carlson; Thaddeus J Unger; Jean-Richard Saint-Martin; Tami Rashal; Joel Ellis; Michael Kauffman; Sharon Shacham; Gary K Schwartz; Albiruni Ryan Abdul Razak
Journal:  J Clin Oncol       Date:  2016-07-25       Impact factor: 44.544

6.  Keratins Are Going Nuclear.

Authors:  Ryan P Hobbs; Justin T Jacob; Pierre A Coulombe
Journal:  Dev Cell       Date:  2016-08-08       Impact factor: 12.270

7.  KPT-330 inhibitor of XPO1-mediated nuclear export has anti-proliferative activity in hepatocellular carcinoma.

Authors:  Yun Zheng; Sigal Gery; Haibo Sun; Sharon Shacham; Michael Kauffman; H Phillip Koeffler
Journal:  Cancer Chemother Pharmacol       Date:  2014-07-17       Impact factor: 3.333

Review 8.  The nuclear export protein XPO1 - from biology to targeted therapy.

Authors:  Asfar S Azmi; Mohammed H Uddin; Ramzi M Mohammad
Journal:  Nat Rev Clin Oncol       Date:  2020-11-10       Impact factor: 66.675

9.  Leukemia-Associated Nup214 Fusion Proteins Disturb the XPO1-Mediated Nuclear-Cytoplasmic Transport Pathway and Thereby the NF-κB Signaling Pathway.

Authors:  Shoko Saito; Sadik Cigdem; Mitsuru Okuwaki; Kyosuke Nagata
Journal:  Mol Cell Biol       Date:  2016-06-15       Impact factor: 4.272

10.  Spatial regulation of greatwall by Cdk1 and PP2A-Tws in the cell cycle.

Authors:  Peng Wang; Myreille Larouche; Karine Normandin; David Kachaner; Haytham Mehsen; Gregory Emery; Vincent Archambault
Journal:  Cell Cycle       Date:  2016       Impact factor: 4.534

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.