Literature DB >> 17098931

SynDB: a Synapse protein DataBase based on synapse ontology.

Wuxue Zhang1, Yong Zhang, Hui Zheng, Chen Zhang, Wei Xiong, John G Olyarchuk, Michael Walker, Weifeng Xu, Min Zhao, Shuqi Zhao, Zhuan Zhou, Liping Wei.   

Abstract

A synapse is the junction across which a nerve impulse passes from an axon terminal to a neuron, muscle cell or gland cell. The functions and building molecules of the synapse are essential to almost all neurobiological processes. To describe synaptic structures and functions, we have developed Synapse Ontology (SynO), a hierarchical representation that includes 177 terms with hundreds of synonyms and branches up to eight levels deep. associated 125 additional protein keywords and 109 InterPro domains with these SynO terms. Using a combination of automated keyword searches, domain searches and manual curation, we collected 14,000 non-redundant synapse-related proteins, including 3000 in human. We extensively annotated the proteins with information about sequence, structure, function, expression, pathways, interactions and disease associations and with hyperlinks to external databases. The data are stored and presented in the Synapse protein DataBase (SynDB, http://syndb.cbi.pku.edu.cn). SynDB can be interactively browsed by SynO, Gene Ontology (GO), domain families, species, chromosomal locations or Tribe-MCL clusters. It can also be searched by text (including Boolean operators) or by sequence similarity. SynDB is the most comprehensive database to date for synaptic proteins.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 17098931      PMCID: PMC1669723          DOI: 10.1093/nar/gkl876

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Recent developments in genomics, proteomics and systems biology have significantly impacted fields such as oncology and immunology (1–5) and are beginning to be applied to neuroscience research, generating an exponentially increasing amount of data (6–11) and calling for efficient databases. However, neuroinformatics databases at the molecular level are currently limited. For instance, databases listed in the Society for Neuroscience Database Gateway (NDG, ) principally contain imaging, anatomic or clinical data, while few focus on the gene or protein level and their functions. The synapse is a specialized intercellular junction between neurons or between neurons and other excitable cells such as muscle. The synapse plays a key role in information processing in the nervous system that underlies many neurobiological processes, including neurotransmission, learning and memory. Defects in synaptic activity are associated with many neurological disorders, including Alzheimer's disease (12). The synapse has also been proposed as an excellent candidate for large-scale systems biology studies (7,8,13). There is a critical need for a focused yet comprehensive database resource for the synapse ‘proteome’. Creating such a database is non-trivial, because the proteins involved in synaptic activities are numerous and diverse and information is scattered in multiple heterogeneous sources. No simple keyword search and no small number of domains can retrieve all the proteins. These complexities may explain why such a database has not been reported thus far. Here, we present the Synapse protein DataBase (SynDB, ) as an information hub for synapse-related proteins.

CONSTRUCTION OF SYNAPSE ONTOLOGY

Ontology is defined as the ‘specification of a conceptualization’ (14). It describes a domain using a collection of concepts or terms and includes the hierarchical relationships between the terms. In order to formally describe synaptic functions and structures, we extensively reviewed three sources of information: (i) three classic text books, Synapses (15), Principles of Neural Science (16) and Ion Channels of Excitable Membranes (17); (ii) 115 recent (2000–2006) review papers published in Nature Reviews Neuroscience and Annual Review of Neuroscience; and (iii) relevant terms in two general ontologies, Gene Ontology (GO) (18) and Medical Subject Headings (MeSH) (19). By reviewing these resources and iteratively organizing the information, we constructed the first synapse ontology (SynO), a hierarchical description of synaptic structures and functions. SynO has two top-level categories: structure and function. Structure is divided into categories such as presynaptic compartment, postsynaptic compartment and glia; and function is divided into categories such as transmitter release and endocytosis, synapse formation and signal transduction in the postsynaptic neuron. In total, SynO contains 177 terms with hundreds of synonyms and up to eight levels deep. SynO is constructed, as is GO, as a directed acyclic graph (DAG). If the terms are represented by vertices and the relationships between terms are represented by edges, the terms in a DAG can be connected via a directed graph without cycles. We used DAG-edit (20) to input, manage and update SynO (Figure 1). We annotated each term with name, synonyms, definition and source references, as well as the ‘part-of’ or ‘is-a’ relationship to other terms. In the definition field, we recorded additional protein keywords associated with the term as well as InterPro (21) domains related to the term (see details below in ‘Association of Proteins’). SynO is available for download in the Open Biomedical Ontologies flat file format (20) at .
Figure 1

DAG-Edit view of Synapse Ontology (SynO): SynO is stored and managed in DAG-Edit. Hierarchical display of names and relationships of SynO terms; the term of interest; description of the term and list of keywords and domains associated with the term and sources from which term was derived; synonyms of the term; the path from root to the term.

DAG-Edit view of Synapse Ontology (SynO): SynO is stored and managed in DAG-Edit. Hierarchical display of names and relationships of SynO terms; the term of interest; description of the term and list of keywords and domains associated with the term and sources from which term was derived; synonyms of the term; the path from root to the term. We developed a Perl script to generate a list of search keywords based on SynO, including and expanding from SynO terms and synonyms. If a SynO term consists of more than one word, the Perl script specified which word can be expanded and whether the order of the words can be flexible. All possible combinations were automatically generated. The expanded list of search keywords was used in the next step.

ASSOCIATION OF PROTEINS

We searched the InterPro database using the search keywords and retrieved 400 protein domains. Through careful manual screening we identified 109 domains as being involved in synaptic activities and assigned them to the most appropriate SynO terms. We retrieved over 5000 proteins using the mapping between InterPro and UniProt (22) and associated these proteins with SynO terms. We then searched UniProt to retrieve additional protein entries that contain the search keywords. While domain-based searches tend to have a high false-negative rate (as not all domains can be modeled), keyword-based searches tend to have a high false-positive rate, requiring that we impose both automated and manual quality control. For example, entries containing ‘immune’ or ‘immunological’ were removed, because ‘immunological synapse’ is a term defining a process in the immunological system that occurs in hundreds of protein entries. In another example, thousands of false-positive entries were removed because they were annotated as being submitted by a company named Synapse. After manual review of thousands of entries, we retrieved over 10 000 proteins and assigned them with SynO. We combined the two sets of proteins and removed redundant entries following the strategy of International Protein Index (23). We considered two UniProt proteins in a species redundant if they were ≥ 95% identical over ≥ 95% of the length of the shorter sequence, based on pair-wise BLASTP of all sequences in the species. Among redundant proteins we selected SwissProt sequences over Trembl sequences. For those sequences from the same data source, we selected longer sequences over shorter ones. The resulting SynDB contains 14 000 non-redundant proteins, including 3000 in human and is the most comprehensive collection of synapse-related proteins to date.

ANNOTATIONS AND WEB INTERFACE DESIGN

To enhance SynDB's utility as an information resource, we developed parsers in Perl to retrieve extensive information on protein sequences, expression, protein–protein and protein-small molecule interactions, disease associations and literature references. Known 3D structures or potential structure templates were retrieved by pair-wise BLASTP comparison between SynDB proteins and non-redundant proteins with known structures from PDB_SELECT_25 (24). In addition, cross-references to ModBase (25) are also provided. Potential metabolic pathways involved were identified by running the KOBAS system against the KEGG database (26,27). Table 1 shows the protein features and related external databases. The information for each protein is integrated and presented in a single graphical web page. For example, the SynDB entry page for Huntingtin Interacting Protein 1 (HIP1) (Figure 2 and ) shows that HIP1 is located on chromosome 7, highly expressed in brain and involved in the Huntington disease pathway. It is included in the Online Mendelian Inheritance in Men database (28) as providing an important molecular link between huntingtin and the neuronal cytoskeleton and has sequence available for download.
Table 1

SynDB protein annotations and cross-referenced molecular databases

Protein annotationsCross-referenced databases
Gene nameNCBI Entrez Gene ()
SpeciesNCBI Taxonomy DB ()
SequencesGenBank ()
Chromosomal locationGoldenPath ()
GO functional categoryGene Ontology (GO) ()
Protein domainInterPro ()
StructureProtein Data Bank (PDB) () ModBase (-cgi)
Gene expressionGEO () BodyMap-Xs ()
Antisense transcriptsNATsDB ()
Post-transcriptional modificationdbPTM ()
Protein familyEnsembl Family ()
PathwayKEGG ()
Protein–protein interactionPPID () DIP ()
Disease associationOMIM ()
ReferencesPubMed ()
Figure 2

A part of protein entry page for human Huntingtin Interacting Protein 1 (HIP1). See ‘Annotations and Web Interface Design’ for a brief description and for detail.

A part of protein entry page for human Huntingtin Interacting Protein 1 (HIP1). See ‘Annotations and Web Interface Design’ for a brief description and for detail. SynDB protein annotations and cross-referenced molecular databases We implemented six interactive browsing options in SynDB. Users can browse synapse proteins by ‘SynO’ or ‘GO’, displayed as hierarchical trees. They can zoom in on a particular branch of the ontology by clicking on the ‘+’ sign to expand the branch. For example, a user interested in ‘transmitters release and endocytosis’ may expand this category and focus on ‘synaptic vesicle cycling’ (Figure 3). ‘Protein Domains’ were grouped into InterPro domain family groups which could be expanded by clicking on the group name. For each domain, the numbers of total, human, mouse and rat proteins are shown. To facilitate study of the evolution of a domain, an ‘Expand’ link shows all species that contain the domain and its prevalence in each. To further facilitate study of the evolution of the synapse proteome across different species, we clustered all SynDB proteins by sequence similarity (BLAST E-value cutoff e−10) using Tribe-MCL (29) and made the clusters available in the ‘MCL Cluster’ browser. A separate ‘Species’ browser lists all species represented in SynDB, in order of decreasing number of proteins. Finally, the ‘Chromosomal Location’ browser (Figure 4), available for human, mouse and rat, allows users to cursor over or click on chromosomal locations to see gene details. Because a number of neural gene families, such as olfactory receptors, have been known to form gene clusters along chromosomes (30), we implemented a ‘Locus number’ field in the ‘Chromosomal Location’ browser that allows the user to enter a cutoff number and view gene clusters of at least that size along a chromosome. Two loci are considered to belong to a cluster if their intergenic distance is less than 500 kb (30,31).
Figure 3

The Synapase Ontology browser. ‘+’ indicates this term could be expanded to list it's child terms.

Figure 4

Chromosomal browser of SynDB: The Chromosomal browser is available for human, mouse and rat. The x-axis shows the different human chromosomes. Users can mouse over or click on a ‘+’ to view a protein translated from the gene locating in that loci of the chromosome. Users can also input a number in ‘Locus number’ to view gene clusters with as few as that members. From this figure, user can get the information which chromosome and which region of the chromosome derives more synapse-related genes.

The Synapase Ontology browser. ‘+’ indicates this term could be expanded to list it's child terms. Chromosomal browser of SynDB: The Chromosomal browser is available for human, mouse and rat. The x-axis shows the different human chromosomes. Users can mouse over or click on a ‘+’ to view a protein translated from the gene locating in that loci of the chromosome. Users can also input a number in ‘Locus number’ to view gene clusters with as few as that members. From this figure, user can get the information which chromosome and which region of the chromosome derives more synapse-related genes. SynDB supports searching by text with Boolean operators. It also supports searching by amino acid or nucleotide sequence similarity with BLAST. Information in SynDB is stored in a MySQL relational database comprised of over 100 tables. Sequences can be downloaded directly from the web and the complete database is available from the authors. We will keep SynO up-to-date by regular review of the latest literature as well as users' and collaborators' comments. We used the Perl scripts which will automatically update the sequences in SynDB followed by manual review.

DISCUSSION

The brain is a complex and subtle network of neurons that communicate with each other via synapses. Chemical synapses are asymmetric contact and play key roles in information processing and storage, behavior and disease. In order to better organize the wealth of synapse-related information and facilitate understanding of synapses, we developed SynDB, an online database for the synapse proteome. SynDB aims to enable systematic studies of the synaptic functions and structures at proteomic level. A focused ontology is essential for the development of such a database because of the numerous and diverse proteins involved. Beyond general-purposed ontologies such as MeSH and GO (18,19), focused ontologies such as SynO are important because they can provide more specific, complete and resolved information to scientists, such as neuroscientists interested in synaptic function. In fact, of 177 SynO terms, only 24 were derived from MeSH and GO. In its first year online, SynDB has had over 600 000 external hits (excluding search engine crawlers). SynDB's objective is to serve as a repository for current knowledge and a potential starting point for experimental design or in silico data mining.
  25 in total

1.  Medical Subject Headings (MeSH).

Authors:  C E Lipscomb
Journal:  Bull Med Libr Assoc       Date:  2000-07

2.  An efficient algorithm for large-scale detection of protein families.

Authors:  A J Enright; S Van Dongen; C A Ouzounis
Journal:  Nucleic Acids Res       Date:  2002-04-01       Impact factor: 16.971

Review 3.  IMGT-ONTOLOGY and IMGT databases, tools and Web resources for immunogenetics and immunoinformatics.

Authors:  Marie-Paule Lefranc
Journal:  Mol Immunol       Date:  2004-01       Impact factor: 4.407

Review 4.  Systems biology in neuroscience: bridging genes to cognition.

Authors:  Seth G N Grant
Journal:  Curr Opin Neurobiol       Date:  2003-10       Impact factor: 6.627

5.  UniProt: the Universal Protein knowledgebase.

Authors:  Rolf Apweiler; Amos Bairoch; Cathy H Wu; Winona C Barker; Brigitte Boeckmann; Serenella Ferro; Elisabeth Gasteiger; Hongzhan Huang; Rodrigo Lopez; Michele Magrane; Maria J Martin; Darren A Natale; Claire O'Donovan; Nicole Redaschi; Lai-Su L Yeh
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

6.  MODBASE, a database of annotated comparative protein structure models, and associated resources.

Authors:  Ursula Pieper; Narayanan Eswar; Hannes Braberg; M S Madhusudhan; Fred P Davis; Ashley C Stuart; Nebojsa Mirkovic; Andrea Rossi; Marc A Marti-Renom; Andras Fiser; Ben Webb; Daniel Greenblatt; Conrad C Huang; Thomas E Ferrin; Andrej Sali
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

7.  Evolution of olfactory receptor genes in the human genome.

Authors:  Yoshihito Niimura; Masatoshi Nei
Journal:  Proc Natl Acad Sci U S A       Date:  2003-09-24       Impact factor: 11.205

8.  Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary.

Authors:  Xizeng Mao; Tao Cai; John G Olyarchuk; Liping Wei
Journal:  Bioinformatics       Date:  2005-04-07       Impact factor: 6.937

9.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders.

Authors:  Ada Hamosh; Alan F Scott; Joanna Amberger; Carol Bocchini; David Valle; Victor A McKusick
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

10.  KOBAS server: a web-based platform for automated annotation and pathway identification.

Authors:  Jianmin Wu; Xizeng Mao; Tao Cai; Jingchu Luo; Liping Wei
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

View more
  21 in total

1.  SynaptomeDB: an ontology-based knowledgebase for synaptic genes.

Authors:  Mehdi Pirooznia; Tao Wang; Dimitrios Avramopoulos; David Valle; Gareth Thomas; Richard L Huganir; Fernando S Goes; James B Potash; Peter P Zandi
Journal:  Bioinformatics       Date:  2012-01-27       Impact factor: 6.937

Review 2.  Computational models of neuronal biophysics and the characterization of potential neuropharmacological targets.

Authors:  Michele Ferrante; Kim T Blackwell; Michele Migliore; Giorgio A Ascoli
Journal:  Curr Med Chem       Date:  2008       Impact factor: 4.530

3.  Systematic resequencing of X-chromosome synaptic genes in autism spectrum disorder and schizophrenia.

Authors:  A Piton; J Gauthier; F F Hamdan; R G Lafrenière; Y Yang; E Henrion; S Laurent; A Noreau; P Thibodeau; L Karemera; D Spiegelman; F Kuku; J Duguay; L Destroismaisons; P Jolivet; M Côté; K Lachapelle; O Diallo; A Raymond; C Marineau; N Champagne; L Xiong; C Gaspar; J-B Rivière; J Tarabeux; P Cossette; M-O Krebs; J L Rapoport; A Addington; L E Delisi; L Mottron; R Joober; E Fombonne; P Drapeau; G A Rouleau
Journal:  Mol Psychiatry       Date:  2010-05-18       Impact factor: 15.992

4.  Systems approach to explore components and interactions in the presynapse.

Authors:  Noura S Abul-Husn; Ittai Bushlin; José A Morón; Sherry L Jenkins; Georgia Dolios; Rong Wang; Ravi Iyengar; Avi Ma'ayan; Lakshmi A Devi
Journal:  Proteomics       Date:  2009-06       Impact factor: 3.984

Review 5.  Macromolecular complexes at active zones: integrated nano-machineries for neurotransmitter release.

Authors:  John Jia En Chua
Journal:  Cell Mol Life Sci       Date:  2014-06-10       Impact factor: 9.261

6.  A comprehensive knowledge base of synaptic electrophysiology in the rodent hippocampal formation.

Authors:  Keivan Moradi; Giorgio A Ascoli
Journal:  Hippocampus       Date:  2019-08-31       Impact factor: 3.899

7.  Direct measure of the de novo mutation rate in autism and schizophrenia cohorts.

Authors:  Philip Awadalla; Julie Gauthier; Rachel A Myers; Ferran Casals; Fadi F Hamdan; Alexander R Griffing; Mélanie Côté; Edouard Henrion; Dan Spiegelman; Julien Tarabeux; Amélie Piton; Yan Yang; Adam Boyko; Carlos Bustamante; Lan Xiong; Judith L Rapoport; Anjené M Addington; J Lynn E DeLisi; Marie-Odile Krebs; Ridha Joober; Bruno Millet; Eric Fombonne; Laurent Mottron; Martine Zilversmit; Jon Keebler; Hussein Daoud; Claude Marineau; Marie-Hélène Roy-Gagnon; Marie-Pierre Dubé; Adam Eyre-Walker; Pierre Drapeau; Eric A Stone; Ronald G Lafrenière; Guy A Rouleau
Journal:  Am J Hum Genet       Date:  2010-09-10       Impact factor: 11.025

8.  The stanley neuropathology consortium integrative database: a novel, web-based tool for exploring neuropathological markers in psychiatric disorders and the biological processes associated with abnormalities of those markers.

Authors:  Sanghyeon Kim; Maree J Webster
Journal:  Neuropsychopharmacology       Date:  2010-01       Impact factor: 7.853

9.  Proteomic analysis of the mitochondria from embryonic and postnatal rat brains reveals response to developmental changes in energy demands.

Authors:  Lance M Villeneuve; Kelly L Stauch; Howard S Fox
Journal:  J Proteomics       Date:  2014-07-18       Impact factor: 4.044

10.  G2Cdb: the Genes to Cognition database.

Authors:  Mike D R Croning; Michael C Marshall; Peter McLaren; J Douglas Armstrong; Seth G N Grant
Journal:  Nucleic Acids Res       Date:  2008-11-04       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.