Literature DB >> 15608168

NMPdb: Database of Nuclear Matrix Proteins.

Abstract

The nuclear matrix (NM) is a structure resulting from the aggregation of proteins and RNA in the nucleus of eukaryotic cells; it is the 'sticky bit' that remains after aggressive DNAse digestion and salt extraction protocols. Owing to the important role of the NM in DNA replication, DNA transcription and RNA splicing, the expression pattern of NM proteins has become an important early indicator for numerous cancers/tumors. Recent descriptions of the NM structure distinguish between a network-like 'internal nuclear matrix' (INM) and a 'nuclear shell' that connects the INM to the inner and outer nuclear membranes. A cautious NM preparation protocol reveals a coat of proteins on top of the INM; these proteins are usually referred to as the 'nuclear matrix-associated proteins'. Here, we describe a new database (NMPdb at http://www.rostlab.org/db/NMPdb/) that currently contains details of 398 NM proteins. We collected these data through a semi-automated analysis of over 3000 scientific articles in PubMed. We could match these 398 proteins to 302 protein sequences in UniProt or GenBank. Our NMPdb repository annotates these links along with the following annotations: organism, cell type, PubMed identifier, sequence-based predictions of structural and functional features and for some entries the explicit sequence segment that is responsible for localization (nuclear matrix targeting signal).

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2005 PMID： 15608168 PMCID： PMC540086 DOI： 10.1093/nar/gki132

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

In the early 1960s, researchers began to describe an important nuclear structure in eukaryotic cells that differed from the already well-known DNA/histone-based chromatin (1). This structure, referred to as the ‘nuclear matrix’ (NM), can be separated from the rest of the nucleus by applying DNAse I digestion followed by salt extraction (2). Many functional aspects of the NM have been described; these include DNA replication (3), DNA transcription (4) and DNA repair (5,6). The existence of the NM as an ‘independent’ sub nuclear structure is not a proven reality but a widely accepted hypothesis that has profoundly influenced the literature: PubMed alone retrieves over 3000 articles associated when queried with the terms ‘nuclear matrix’ or ‘nuclear scaffold’. The NM might still be an artificial result of the preparation methods rather than a real in vivo structure (7–9). However, the main facts that argue in favor of the existence of this controversial part of the nucleus are its observation in non-eluted nuclei through electron spectroscopic imaging (10), the existence of protocols to isolate the NM at physiological salt concentrations through electroelution of chromatin (11), the fact that chromatin loops (S/MAR-DNA sequences) bind to a non-chromatin network and finally the description of functional units that stay in their original place even after removing chromatin and soluble proteins from the nucleus (12). Two main structural elements form the NM (13): the ‘internal nuclear matrix’ (INM) and the ‘nuclear shell’ (or ‘nuclear lamina’). The INM is an aggregate of proteins, mainly the intermediate filaments lamins, NuMa (13) and hnRNP proteins (13,14). The nuclear shell links the INM to the nuclear membranes and/or nuclear envelope. Several non-INM proteins can be separated along with the INM through more careful preparation protocols (15,16). These proteins are usually referred to as ‘associated with the nuclear matrix’. The protein composition of nuclear matrices in different organisms and cell types was discovered mainly by 2D gel electrophoresis, a method that separates proteins based on their isoelectric points (first dimension) and molecular weight (second dimension). Nuclear matrices, once separated from the chromatin and the soluble compartments of the nucleus, contain very different proteins in tumor than in non-tumor cells (17,18). In cancer research, these differences provide early indications for different types of tumors. Collecting and analyzing data about NM proteins may help to understand the relationship between those proteins and cancer and to discover NM-associated proteins that have not been implicated with the NM. The vast majority of proteins that have actually been associated experimentally with the NM are not annotated in public databases. Thus, we have built and are maintaining NMPdb, a database with proteins that are associated to the nuclear matrix.

DATABASE

Nuclear matrix proteins collected from the literature

First, we downloaded over 3000 abstracts from PubMed that resulted from queries with the terms ‘nuclear matrix/matrices’ and ‘nuclear scaffold’. Then we wrote a simple Perl script that color-highlighted three types of phrases in the text (through HTML tagging): (i) ‘nuclear matrix’ terms, (ii) UniProt protein names and (iii) verbs describing binding processes such as ‘to bind’, ‘to associate’ or ‘to interact’ (Figure 1). Each abstract was followed by HTML elements that enabled the quick interactive subclassification of each protein into one of the following classes: (i) part of the internal nuclear matrix (INM), (ii)‘tightly’ associated with the INM (ASC), (iii) affinity toward the INM changes depending on protein modification, cell type and/or current stage of the cell cycle (MIX) and (iv) part of the nuclear shell/nuclear lamina (NUS). At this point, we also removed abstracts that contained the search words but did not promise to add information to our database. Finally, we collected the names of the organisms and the cell types in which the interaction with the NM was observed.

Figure 1

Screenshot of an NMPdb entry. The names and gene names of the proteins are given in the fields NA and GN. Also shown are the fields for organism (OS) and the cell type of observed NM interaction (CT). Additionally, links are given to Swiss-Prot (SP), GenBank (GB), OMIM (OM) and to all PubMed articles (P1/P2) that were mined for information about the NM interaction of the protein.

Content

Currently, NMPdb contains over 3000 links to PubMed articles corresponding to about 400 unique proteins; for about 300 of these proteins we could verify the links to their sequences through either UniProt (19) or GenBank (20). Only 62 of all proteins had significant sequence similarity to any protein with known high-resolution information about the 3D structure as deposited in the PDB (21). Only 101 of the 300 proteins were very different in their sequences [HSSP values below 0 (22)], and about half had rather high levels of sequence similarity to at least one other protein in our set (HSSP value >10). Of the 400 proteins, 42, were classified as INM, 198 as ASC and 130 as MIX; very few (currently 13) were classified as NUS. Most proteins (301) are mammalian (predominantly human, rat and mouse); 29 are viral proteins (e.g. HIV, Papyloma/HPV, Epstein–Barr/EBV). Since such viral proteins are typically involved in the transcription of host DNA, it is not surprising that they are an abundant part of the nuclear matrix in infected cells. Other organisms prominent in NMPdb are Gallus gallus (chicken, with 16 proteins), Drosophila melanogaster (fruit fly, with 14 proteins), Saccharomyces cerevisiae (yeast, with 13 proteins) and Caenorhabditis elegans (worm, with 6 proteins).

Format and fields

NMPdb has been formatted in an EMBL-like flat file format. Each NM protein is represented by one entry. All entries in the database contain the following fields: (i) origin (organism and cell types), (ii) type of nuclear matrix interaction/involvement (INM, ASC, MIX or NUS), (iii) molecular mass and known or calculated pI for locating the protein on a 2D gel and (iv) reference (PubMed IDs of articles describing the interaction). For some entries we provide additional links to other databases, give the actual protein sequence and collect sequence-based predictions. Although links to UniProt implicitly link NMPdb to a variety of other databases, we also provide explicit links to OMIM (23), SWISS-2DPAGE (24) and S/MARt DB (25)—which contains the DNA sequences that the respective protein binds to. We provide the following information for all proteins for which we have sequences: (i) the structural domain-like organization according to CHOP (26,27), (ii) predictions of secondary structure, solvent accessibility and membrane helices through PROFphd [B. Rost, manuscript submitted; (28,29)], (iii) coiled-coil regions through COILS (30), (iv) disordered regions through NORSp (31,32). Where possible, entries are also cross-linked to PEP, a database with predictions for entire proteomes (33) that also contains sequence alignments. For 53 sequences in the database, we found pecific information about which part of the sequence is responsible and necessary for NM binding. These regions, usually referred to as nuclear matrix targeting signals (NMTS), are also deposited in NMPdb if available.

Access

NMPdb can be accessed from http://www.rostlab.org/db/NMPdb/—a search-engine interface that allows the querying by different database fields and the linking of queries through ‘AND’, ‘OR’ and ‘AND-NOT’. The complete NMPdb database can be downloaded via ftp. The content of the database, the meaning of the fields and the search interface are described in separate help pages.

Updates

NMPdb annotates many times more proteins as nuclear matrix-associated (∼400) than other public databases such as UniProt (∼80 NM proteins), the ‘nuclear protein database’ (34) (27 NM proteins) or the S/MARt-db (25) (80 NM proteins). We manually update NMPdb once a week at the moment and hope to maintain at least monthly updates for the years to come.

33 in total

1. UniProt: the Universal Protein knowledgebase.

Authors: Rolf Apweiler; Amos Bairoch; Cathy H Wu; Winona C Barker; Brigitte Boeckmann; Serenella Ferro; Elisabeth Gasteiger; Hongzhan Huang; Rodrigo Lopez; Michele Magrane; Maria J Martin; Darren A Natale; Claire O'Donovan; Nicole Redaschi; Lai-Su L Yeh
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

2. UniqueProt: Creating representative protein sequence sets.

Authors: Sven Mika; Burkhard Rost
Journal: Nucleic Acids Res Date: 2003-07-01 Impact factor: 16.971

3. The PredictProtein server.

Authors: Burkhard Rost; Guy Yachdav; Jinfeng Liu
Journal: Nucleic Acids Res Date: 2004-07-01 Impact factor: 16.971

4. CHOP: parsing proteins into structural domains.

Authors: Jinfeng Liu; Burkhard Rost
Journal: Nucleic Acids Res Date: 2004-07-01 Impact factor: 16.971

5. SWISS-2DPAGE, ten years later.

Authors: Christine Hoogland; Khaled Mostaguir; Jean-Charles Sanchez; Denis F Hochstrasser; Ron D Appel
Journal: Proteomics Date: 2004-08 Impact factor: 3.984

6. CHOP proteins into structural domain-like fragments.

Authors: Jinfeng Liu; Burkhard Rost
Journal: Proteins Date: 2004-05-15

7. Binding of the DNA polymerase alpha-DNA primase complex to the nuclear matrix in HeLa cells.

Authors: J M Collins; A K Chu
Journal: Biochemistry Date: 1987-09-08 Impact factor: 3.162

8. Coiled-coils in alpha-helix-containing proteins: analysis of the residue types within the heptad repeat and the use of these data in the prediction of coiled-coils in other proteins.

Authors: D A Parry
Journal: Biosci Rep Date: 1982-12 Impact factor: 3.840

9. PEP: Predictions for Entire Proteomes.

Authors: Phil Carter; Jinfeng Liu; Burkhard Rost
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

10. Database resources of the National Center for Biotechnology Information: update.

Authors: David L Wheeler; Deanna M Church; Ron Edgar; Scott Federhen; Wolfgang Helmberg; Thomas L Madden; Joan U Pontius; Gregory D Schuler; Lynn M Schriml; Edwin Sequeira; Tugba O Suzek; Tatiana A Tatusova; Lukas Wagner
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

20 in total

1. Nuclear matrix proteome analysis of Drosophila melanogaster.

Authors: Satish Kallappagoudar; Parul Varma; Rashmi Upadhyay Pathak; Ramamoorthy Senthilkumar; Rakesh K Mishra
Journal: Mol Cell Proteomics Date: 2010-06-08 Impact factor: 5.911

Review 2. The nucleoskeleton as a genome-associated dynamic 'network of networks'.

Authors: Dan N Simon; Katherine L Wilson
Journal: Nat Rev Mol Cell Biol Date: 2011-10-05 Impact factor: 94.444

Review 3. The sperm nucleus: chromatin, RNA, and the nuclear matrix.

Authors: Graham D Johnson; Claudia Lalancette; Amelia K Linnemann; Frédéric Leduc; Guylain Boissonneault; Stephen A Krawetz
Journal: Reproduction Date: 2010-09-27 Impact factor: 3.906

4. Towards understanding the epigenetics of transcription by chromatin structure and the nuclear matrix.

Authors: Rui Pires Martins; Stephen A Krawetz
Journal: Gene Ther Mol Biol Date: 2005

Review 5. A requiem to the nuclear matrix: from a controversial concept to 3D organization of the nucleus.

Authors: S V Razin; O V Iarovaia; Y S Vassetzky
Journal: Chromosoma Date: 2014-03-25 Impact factor: 4.316

6. A structural basis for cellular senescence.

Authors: Armando Aranda-Anzaldo
Journal: Aging (Albany NY) Date: 2009-07-29 Impact factor: 5.682

7. Subnuclear proteomics in colorectal cancer: identification of proteins enriched in the nuclear matrix fraction and regulation in adenoma to carcinoma progression.

Authors: Jakob Albrethsen; Jaco C Knol; Sander R Piersma; Thang V Pham; Meike de Wit; Sandra Mongera; Beatriz Carvalho; Henk M W Verheul; Remond J A Fijneman; Gerrit A Meijer; Connie R Jimenez
Journal: Mol Cell Proteomics Date: 2010-01-20 Impact factor: 5.911

8. Phylogenetic analysis of the SAP30 family of transcriptional regulators reveals functional divergence in the domain that binds the nuclear matrix.

Authors: Keijo M Viiri; Taisto Y K Heinonen; Markku Mäki; Olli Lohi
Journal: BMC Evol Biol Date: 2009-06-30 Impact factor: 3.260

9. Characterization of the nuclear matrix targeting sequence (NMTS) of the BPV1 E8/E2 protein--the shortest known NMTS.

Authors: Eve Sankovski; Kristiina Karro; Mari Sepp; Reet Kurg; Mart Ustav; Aare Abroi
Journal: Nucleus Date: 2015 Impact factor: 4.197

10. Determination of the in vivo structural DNA loop organization in the genomic region of the rat albumin locus by means of a topological approach.

Authors: Juan Carlos Rivera-Mulia; Armando Aranda-Anzaldo
Journal: DNA Res Date: 2010-01-04 Impact factor: 4.458