Literature DB >> 18304364

DBMLoc: a Database of proteins with multiple subcellular localizations.

Song Zhang1, Xuefeng Xia, Jincheng Shen, Yun Zhou, Zhirong Sun.   

Abstract

BACKGROUND: Subcellular localization information is one of the key features to protein function research. Locating to a specific subcellular compartment is essential for a protein to function efficiently. Proteins which have multiple localizations will provide more clues. This kind of proteins may take a high proportion, even more than 35%. DESCRIPTION: We have developed a database of proteins with multiple subcellular localizations, designated DBMLoc. The initial release contains 10470 multiple subcellular localization-annotated entries. Annotations are collected from primary protein databases, specific subcellular localization databases and literature texts. All the protein entries are cross-referenced to GO annotations and SwissProt. Protein-protein interactions are also annotated. They are classified into 12 large subcellular localization categories based on GO hierarchical architecture and original annotations. Download, search and sequence BLAST tools are also available on the website.
CONCLUSION: DBMLoc is a protein database which collects proteins with more than one subcellular localization annotation. It is freely accessed at http://www.bioinfo.tsinghua.edu.cn/DBMLoc/index.htm.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18304364      PMCID: PMC2292141          DOI: 10.1186/1471-2105-9-127

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


Background

Knowledge of subcellular localization is crucial to understanding protein function and biological process. During translation or later, proteins will be transported into different compartments such as cytoplasm, membrane system, mitochondrion, etc., or may be secreted out of the cell. Locating to a specific subcellular compartment is essential for a protein to function efficiently. High-throughput experimental approaches like immuno-localization[1], tagged genes and reported fusions[2,3] have made the growth of localization data catch up with the avalanche of protein data. Swiss-Prot is a comprehensive database which includes subcellular localization information. In the recent years, some specific subcellular localization databases are constructed based on experimentation, computational prediction or both. The subcellular localization data of LOCATE[4] are from high-throughput immunofluorescence-based assay and publications. Organelle DB[5] annotates all protein localizations using vocabulary from the Gene Ontology consortium which facilitates data interoperability. DBSubLoc[6] uses a keyword-based system to integrate Swiss-Prot subcellular localization annotations. LOCtarget[7] and PA-GOSUB[8] implement predictors of subcellular localization based on different methods have been reported. PSORTdb[9] is a database for bacteria that contains both information determined through laboratory experimentation (ePSORTdb) and computational predictions (cPSORTdb). Eukaryotic database, eSLDB[10], collects five species' location data which are experimental-determined, homology-based or predicted. In addition, some bioinformatics methods have been developed to predict the protein subcellular location, which make use of the sorting signals[11], domain information[12], amino acid composition in the sequences [13-15] or other information[16]. However, a lot of proteins have more than one subcellular localization annotations. These proteins may simultaneously locate or move between different cellular compartments, for example, transcription factors and signaling pathway transduction factors. Proteins may play different roles in biological process when they are in different subcellular localizations. For these proteins, single subcellular localization annotation will lose some important information. Usually these proteins have more important biological functions. Their localization annotations will provide more valuable clues to researchers. These proteins are quite common, accounting for about 39% of all organellar proteins in mouse liver[17]. However, there are very few proteins annotated with multiple locations in the available subcellular localization databases. Here we have built the database DBMLoc which collects proteins with multiple subcellular localization annotations. It provides useful information for protein functional research as well as computational prediction. In addition, taxonomy, Swiss-Prot, GO and interaction information are also annotated. If protein has interactions, a subcellular localization quality score is computed on the basis of its interaction proteins' locations.

Construction and content

The DBMLoc database is mainly developed from primary protein databases (Swiss-Prot/TrEMBL[18]), available experimental-determined subcellular localization databases (DBSubloc[6], ePSORTdb[9], MitoProteome[19], Organelle DB[5] and LOCATE[4]) and some literature references. Only full-length and unambiguous proteins are selected from Swiss-Prot, and those whose subcellular localization annotations are marked with "by similarity", "probable", "possible", "potential", "may be" are excluded. At the same time, multiple annotations are collected from subcellular localization databases (DBSubloc, ePSORTdb, MitoProteome, Organelle DB and LOCATE), then they are mapped to the protein set derived from Swiss-Prot. The redundant annotations are filtered. In order to standardize subcellular localization annotation terms, various terms of cellular compartments and complexes are assigned into twelve large organelle categories as follows: extracellular, cell wall, membrane, cytoplasm, mitochondrion, nucleus, ribosome, plastid, endoplasmic reticulum, Golgi apparatus, vacuole and virion. Cell wall, plastid and vacuole are unique in plant cell. Some subcellular localization annotations which can not be classified into the twelve categories are assigned into "others". There are 616 proteins that have "others" annotations. This process is mainly based on the Gene Ontology[20] annotations and original subcellular localization annotations. We annotate the proteins with GO ID from their primary sources or the annotation tools provided by GOA (Gene Ontology Annotation Database)[21]. The proteins are also cross-referenced to the NCBI Taxonomy database[22]. Sub-datasets are derived based on their taxonomy class (i.e. animal, plant, eukaryote, etc.) Proteins that interact with each other tend to share the same subcellular localizations, so we annotate the protein with interaction data collected from DIP[23], MINT[24] and BIND[25]. To check the subcellular localization annotation quality, if it has interaction proteins, a quality score is computed based on the following formula. The higher the score is, the more reliable the subcellular localization annotations are. All the proteins whose score equals 1 are integrated into a high quality dataset. N1: Number of the localizations shared by its interaction proteins' subcellular localizations. N2: Number of protein's subcellular localizations. Finally, with some literature annotated proteins added, 10470 protein entries are integrated into DBMLoc database. The downloadable DBMLoc database and non-redundant sub-datasets are released as plain text files. The format is similar to that of Swiss-Prot data file. Each line in the file is one record of an entry in the 'KEY VALUE' format. The cross-reference records begin with a 'CX' key. Each of the value data contains one cross-reference record in the 'Reference Database: Reference ID' format, for example, the 'CX SWISS-PROT: Q85FL3' record means that the protein entry is linked to SWISS-PROT database Q85FL3 entry. More detailed description of the format can be found on the web page.

Utility and discussion

We provide free download of the database, organism specific sub-datasets and taxonomy-categorized files for all the education and research users. Users can search the database with DBMLoc identity, cross-referenced database identity or protein name. Figures 1 and 2 show the name and identity search results. Protein sequence also can be submitted to search for homologous proteins in the full DBMLoc database or in one of its subsets.
Figure 1

Protein name search result with keyword "actin".

Figure 2

Swiss-Prot identity search result with query "Q9Y5S9".

Protein name search result with keyword "actin". Swiss-Prot identity search result with query "Q9Y5S9". The initial release contains 10470 multiple subcellular localization-annotated protein entries. Non-redundant protein data sets with sequence similarity less than 90% and 25% are also generated by BLAST. Table 1 lists brief statistical information on full and non-redundant data sets. The detailed statistical information is on the web page.
Table 1

Brief statistics of DBMLoc

Full data setsNon-redundant data sets (90%)Non-redundant data sets (25%)
Two subcellular localizations888760552366
Three subcellular localizations14611112593
Four subcellular localizations10710085
Eukaryote995467272549
Animal649242401523
Plant346224871278
Brief statistics of DBMLoc Various databases' annotations integrated together in DBMLoc database might be false annotations or conflicts. So, we will pay more attention to the quality of data in the future development. More experimental data and other available information, like experimental method and post-translation modification, will be integrated to the database. The database will be updated regularly as new version of Swiss-Prot is available. Besides, more web services and analysis tools will be developed.

Conclusion

DBMLoc is a specific database aimed at multiple localization annotated proteins. Proteins are cross-referenced to NCBI taxonomy, Gene Ontology and original database. Proteins that interact with each other tend to share the same subcellular localizations. So, protein-protein interaction information is also integrated into the database. A quality score is derived from protein-protein interactions. These data will be valuable to help experimental and computational biologists understand and analyze biological function.

Availability and requirements

DBMLoc home page: License: The database is freely available.

List of abbreviations

GO: Gene Ontology.

Authors' contributions

SZ and XX designed and constructed the database. SZ drafted the manuscript. JS and YZ participated in data curation. ZS supervised the project. All authors read and approved the final manuscript.
  25 in total

1.  DIP: the database of interacting proteins.

Authors:  I Xenarios; D W Rice; L Salwinski; M K Baron; E M Marcotte; D Eisenberg
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

3.  Support vector machine approach for protein subcellular localization prediction.

Authors:  S Hua; Z Sun
Journal:  Bioinformatics       Date:  2001-08       Impact factor: 6.937

4.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003.

Authors:  Brigitte Boeckmann; Amos Bairoch; Rolf Apweiler; Marie-Claude Blatter; Anne Estreicher; Elisabeth Gasteiger; Maria J Martin; Karine Michoud; Claire O'Donovan; Isabelle Phan; Sandrine Pilbout; Michel Schneider
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

5.  BIND: the Biomolecular Interaction Network Database.

Authors:  Gary D Bader; Doron Betel; Christopher W V Hogue
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

6.  Predicting protein cellular localization using a domain projection method.

Authors:  Richard Mott; Jörg Schultz; Peer Bork; Chris P Ponting
Journal:  Genome Res       Date:  2002-08       Impact factor: 9.043

7.  Subcellular localization of the yeast proteome.

Authors:  Anuj Kumar; Seema Agarwal; John A Heyman; Sandra Matson; Matthew Heidtman; Stacy Piccirillo; Lara Umansky; Amar Drawid; Ronald Jansen; Yang Liu; Kei-Hoi Cheung; Perry Miller; Mark Gerstein; G Shirleen Roeder; Michael Snyder
Journal:  Genes Dev       Date:  2002-03-15       Impact factor: 11.361

Review 8.  MINT: a Molecular INTeraction database.

Authors:  Andreas Zanzoni; Luisa Montecchi-Palazzi; Michele Quondam; Gabriele Ausiello; Manuela Helmer-Citterich; Gianni Cesareni
Journal:  FEBS Lett       Date:  2002-02-20       Impact factor: 4.124

9.  Large-scale analysis of the yeast genome by transposon tagging and gene disruption.

Authors:  P Ross-Macdonald; P S Coelho; T Roemer; S Agarwal; A Kumar; R Jansen; K H Cheung; A Sheehan; D Symoniatis; L Umansky; M Heidtman; F K Nelson; H Iwasaki; K Hager; M Gerstein; P Miller; G S Roeder; M Snyder
Journal:  Nature       Date:  1999-11-25       Impact factor: 49.962

10.  eSLDB: eukaryotic subcellular localization database.

Authors:  Andea Pierleoni; Pier Luigi Martelli; Piero Fariselli; Rita Casadio
Journal:  Nucleic Acids Res       Date:  2006-11-15       Impact factor: 16.971

View more
  19 in total

1.  PMLPR: A novel method for predicting subcellular localization based on recommender systems.

Authors:  Elnaz Mirzaei Mehrabad; Reza Hassanzadeh; Changiz Eslahchi
Journal:  Sci Rep       Date:  2018-08-13       Impact factor: 4.379

2.  Protein subcellular localization prediction of eukaryotes using a knowledge-based approach.

Authors:  Hsin-Nan Lin; Ching-Tai Chen; Ting-Yi Sung; Shinn-Ying Ho; Wen-Lian Hsu
Journal:  BMC Bioinformatics       Date:  2009-12-03       Impact factor: 3.169

3.  CoBaltDB: Complete bacterial and archaeal orfeomes subcellular localization database and associated resources.

Authors:  David Goudenège; Stéphane Avner; Céline Lucchetti-Miganeh; Frédérique Barloy-Hubler
Journal:  BMC Microbiol       Date:  2010-03-23       Impact factor: 3.605

4.  Going from where to why--interpretable prediction of protein subcellular localization.

Authors:  Sebastian Briesemeister; Jörg Rahnenführer; Oliver Kohlbacher
Journal:  Bioinformatics       Date:  2010-03-17       Impact factor: 6.937

5.  YLoc--an interpretable web server for predicting subcellular localization.

Authors:  Sebastian Briesemeister; Jörg Rahnenführer; Oliver Kohlbacher
Journal:  Nucleic Acids Res       Date:  2010-05-27       Impact factor: 16.971

6.  PSORTdb--an expanded, auto-updated, user-friendly protein subcellular localization database for Bacteria and Archaea.

Authors:  Nancy Y Yu; Matthew R Laird; Cory Spencer; Fiona S L Brinkman
Journal:  Nucleic Acids Res       Date:  2010-11-10       Impact factor: 16.971

7.  Imbalanced multi-modal multi-label learning for subcellular localization prediction of human proteins with both single and multiple sites.

Authors:  Jianjun He; Hong Gu; Wenqi Liu
Journal:  PLoS One       Date:  2012-06-08       Impact factor: 3.240

8.  A multi-label predictor for identifying the subcellular locations of singleplex and multiplex eukaryotic proteins.

Authors:  Xiao Wang; Guo-Zheng Li
Journal:  PLoS One       Date:  2012-05-22       Impact factor: 3.240

9.  Changes in the nuclear proteome of developing wheat (Triticum aestivum L.) grain.

Authors:  Titouan Bonnot; Emmanuelle Bancel; Christophe Chambon; Julie Boudet; Gérard Branlard; Pierre Martre
Journal:  Front Plant Sci       Date:  2015-10-28       Impact factor: 5.753

10.  Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework.

Authors:  Ramanuja Simha; Hagit Shatkay
Journal:  Algorithms Mol Biol       Date:  2014-03-19       Impact factor: 1.405

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.