Literature DB >> 24792157

Fifteen years SIB Swiss Institute of Bioinformatics: life science databases, tools and support.

Heinz Stockinger¹, Adrian M Altenhoff², Konstantin Arnold³, Amos Bairoch⁴, Frederic Bastian⁵, Sven Bergmann⁵, Lydie Bougueleret⁶, Philipp Bucher⁷, Mauro Delorenzi⁵, Lydie Lane⁴, Philippe Le Mercier⁶, Frédérique Lisacek⁴, Olivier Michielin⁸, Patricia M Palagi⁹, Jacques Rougemont⁷, Torsten Schwede³, Christian von Mering¹⁰, Erik van Nimwegen³, Daniel Walther⁶, Ioannis Xenarios¹¹, Mihaela Zavolan³, Evgeny M Zdobnov⁴, Vincent Zoete⁶, Ron D Appel¹².

Abstract

The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) was created in 1998 as an institution to foster excellence in bioinformatics. It is renowned worldwide for its databases and software tools, such as UniProtKB/Swiss-Prot, PROSITE, SWISS-MODEL, STRING, etc, that are all accessible on ExPASy.org, SIB's Bioinformatics Resource Portal. This article provides an overview of the scientific and training resources SIB has consistently been offering to the life science community for more than 15 years.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2014 PMID： 24792157 PMCID： PMC4086091 DOI： 10.1093/nar/gku380

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The SIB Swiss Institute of Bioinformatics was formally founded in 1998 but prior bioinformatics services initiated in the 1980s, such as Swiss-Prot (1) (now UniProtKB/Swiss-Prot), ExPASy (2), Melanie (3), PROSITE (4), SWISS-2DPAGE (5) and SWISS-MODEL (6), were already provided by groups that are now part of SIB. In fact, some of the leaders of these early projects are among the founding members of SIB. From 1998 to 2013 (the year of SIB's 15th anniversary), SIB grew from its original 5 groups and 30 scientists to 46 groups and more than 600 members and employees. While the abovementioned resources are still provided (and continuously developed and enhanced), the inclusion of new groups has significantly broadened the competence of SIB as well as the tools and databases it provides. In fact, SIB guarantees long-term support of scientific resources while adding new services to its resource portfolio. For a detailed list of SIB resources, refer to ExPASy.org, SIB's Bioinformatics Resource Portal. Moreover, SIB also acts as the Swiss node within ELIXIR (http://www.elixir-europe.org, a European initiative to provide a sustainable infrastructure for biological information), and therefore has an essential role for Switzerland, Europe and beyond. SIB is a foundation and therefore a legal entity on its own, and it works closely with bioinformatics researchers. Most of SIB groups are co-affiliated with a university; in fact, SIB group leaders are usually appointed university professors. In addition to performing academic research, the groups can be supported by SIB, which helps ensuring sustainability of key bioinformatics services using its own funds. Several of these resources have gained wide acceptance and are used by the life science research community worldwide. In this article, we will focus on SIB-funded resources, projects and services that have been developed within the last 15 years.

OVERVIEW OF SIB-FUNDED RESOURCES

In the last 15 years, SIB and its current 46 research and services groups have created more than 100 bioinformatics services, databases and software tools. To list them all is beyond the scope of this article, so we focus on a set of representative resources that have been funded by grants from the Swiss State Secretariat for Education, Research and Innovation (SERI) to SIB. Several of these resources also received funding via other sources and funding agencies (Swiss National Science Foundation, European Commission, National Institutes of Health, etc). In the remainder of this article, the resources are organized according to scientific categories as it is done on ExPASy.org. A brief overview is given in Figure 1.

Figure 1.

Overview of SIB resources mentioned in the remainder of the article. The resources are clustered according to scientific category (proteomics, genomics, etc, on the horizontal axis) and roughly according to the year they were created (time is depicted vertically). That shows the historical perspective with respect to the creation of SIB in the year 1998. Most of the resources provide databases or biological knowledge bases (except certain drug design tools such as SwissDock and systems biology tools such as QuickTest, PPA, ExpressionView and ISA). Note that most of the resources have been designed, developed and released independently of each other but some of them have direct dependencies (e.g. rely on UniProtKB/Swiss-Prot database versions such as ViralZone or PROSITE). Several of the resources have cross-references to each other (such as SWISS-MODEL and STRING to UniProtKB/Swiss-Prot) but a more detailed data dependency or interaction graph is beyond the scope of the article.

Proteomics

UniProtKB/Swiss-Prot (http://uniprot.org) is the manually reviewed component of UniProtKB (7), the most widely used knowledge base on proteins. It provides expert curation with information extracted from literature and curator-evaluated computational analysis mainly focusing on functional data. The data are constantly reviewed and updated and can be used as corpus of reference annotations to cope with the avalanche of newly sequenced genomes. neXtProt (http://nextprot.org) (8) is an innovative knowledge platform dedicated to human proteins. This resource complements UniProtKB/Swiss-Prot by adding genomic, transcriptomic and proteomic information relative to human proteins carefully selected from high-throughput experiments. Recently, expression and subcellular location data from the Human Protein Atlas (http://www.proteinatlas.org/) and peptide identifications from PeptideAtlas (http://www.peptideatlas.org/) were integrated, as well as a huge number of protein post-translational modifications from literature, and single amino acid variants from COSMIC(http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/) and dbSNP (http://www.ncbi.nlm.nih.gov/SNP/). Since 2013, neXtProt is the reference knowledge base for the chromosome-centric part of the HUPO Human Proteome Projects(http://www.hupo.org/initiatives/human-proteome-project/). STRING (http://string-db.org) (9) is a database of known and predicted protein–protein interactions. The database contains information from numerous sources, including experimental repositories, computational prediction methods and public text collections. STRING is regularly updated and gives a comprehensive overview on protein–protein interactions that are currently available. The most recent update covers more than 300 million confidence-scored protein–protein interactions in 1133 model organisms. Among the predicted interactions in STRING, a large part stems from automated text-mining—whereby collections of scientific texts are mined for statistical co-occurrences of protein names—and for semantically parsed interaction statements (natural language processing). Recently, this interaction text-mining has been expanded from abstracts to full-text publications, updating more than 1.8 Mio published articles to full-text coverage. Further recent changes include updated procedures for transferring interaction information from one model organism to another (‘interolog transfer’), and user-interface improvements that provide statistical annotations to user-provided gene lists (functional enrichments and interaction enrichments). PROSITE (http://prosite.expasy.org) consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them. High performance tools are provided to efficiently use this information at genome-scale level (10,11). Melanie (http://world-2dpage.expasy.org/melanie) offers a unique and flexible interface for the comprehensive visualization, exploration and analysis of 2D gel data. It provides solutions to shorten the path from data acquisition to protein information, both for conventional 2-DE and DIGE (Fluorescence Difference Gel Electrophoresis) gels. Among other features, a new integrated workflow reduces the time taken to analyse gels and enhances cross-lab reproducibility. MSight (http://web.expasy.org/MSight/) (12) extends Melanie to large-scale multidimensional mass spectrometry (for example LC-MS) by allowing the visualization and analysis in a fashion similar to classical 2D-gel processing. SugarBind (http://sugarbind.expasy.org) (13) is a database that provides information on the binding of pathogenic lectins or adhesins to a specific human glycan. The data were compiled through an exhaustive search of literature published over the past decades by glycobiologists, microbiologists and medical histologists. The database was developed and maintained by the MITRE Corporation until 2010, then transferred to SIB where it was substantially enriched in content and connectivity. A correspondingly new interface was released late 2013 to match the UniCarbKB environment (see next resource). UniCarbKB (http://unicarbkb.org) (14) is a curated and annotated glycan database, which contains information from the scientific literature on glycoprotein derived glycan structures. It includes data previously available from GlycoSuiteDB (15), i.e. UniCarbKB replaces GlycoSuiteDB which is not maintained anymore. The database can be queried with a (sub)structure, monosaccharide composition, glycan mass, taxonomy, tissue, disease, glycoprotein (UniProt accession number or name) and published reference. This initiative is undertaken jointly with N.H. Packer's group (http://www.bmfrc.mq.edu.au) within an international consortium of glycobiologists and bioinformaticians. ViralZone (http://viralzone.expasy.org) (16) is a web resource for all viral genus and families, providing general molecular and epidemiological information, along with viral structure and genome information. Each virus or family page gives an easy access to UniProtKB/Swiss-Prot viral protein entries. Recently, the resource has been complemented with description of viral molecular processes, linked to UniProt keywords and GO (http://www.geneontology.org/) terms. A new e-learning section provides basic bioinformatics courses for virologists.

Genomics

EPD (Eukaryotic Promoter Database, http://epd.vital-it.ch) (17) is an annotated non-redundant collection of eukaryotic POL II promoters based on scientific literature. In 2011, a new section called EPDNew was introduced providing comprehensive promoter collections based on NGS data for important model organism. In 2013, EPDNew was extended to zebrafish in response to the public release of CAGE data for this organism. MirZ (http://www.mirz.unibas.ch) (18) is a resource that integrates miRNA expression data for human, mouse, rat, zebrafish, worm and fruitfly small RNAs and miRNA target predictions obtained through the ElMMo algorithm (19). The resource has been extended to include additional experimental evidence for miRNA binding sites, obtained through crosslinking and immuno-precipitation (CLIP) of Argonaute proteins. Several tools for exploring the binding sites of Argonaute proteins are available through the ClipZ server (http://www.clipz.unibas.ch) (20), whose content is expanding continuously as users can upload and analyze their own CLIP data. SwissRegulon (http://swissregulon.unibas.ch) (21) is a database of genome-wide annotations of promoters, curated regulatory motifs, and predicted regulatory sites for these motifs across a wide range of model organisms. Currently, SwissRegulon contains annotations for 17 prokaryotes and 3 eukaryotes. All data are accessible through both an easily navigable genome browser with search functions, and as flat files that can be downloaded for further analysis. Through the SwissRegulon portal we also provide a number of related web services. In particular, the Integrated System for Motif Activity Response Analysis (ISMARA, http://ismara.unibas.ch) (22) allows users to automatically model their gene expression (microarray/RNA-seq) or chromatin state data (ChIP-seq) in terms of the predicted regulatory sites.

Structural bioinformatics and drug design

SWISS-MODEL (http://swissmodel.expasy.org) is a fully automated web-based protein structure homology-modeling expert system. An interactive web-based workspace assists and guides the user in building protein structure homology models and evaluating their expected accuracy (23). The SWISS-MODEL Repository is a database of annotated protein structure models for selected model organism proteomes of common interest, which are generated and regularly kept up to date by a fully automated modeling pipeline (24). Recent additions to the service include the automated prediction of quaternary structure and inclusion of essential ligands and cofactors in the models. Models are made available within the Protein Model Portal (http://www.proteinmodelportal.org/), which aims to provide a comprehensive interface to protein structure information by combining experimental structures from the PDB with computational predictions by various established computational modeling services. SIB develops and provides a variety of resources for molecular modeling and drug design, including SwissDock (25), a small-molecule docking web service (http://www.swissdock.ch), SwissBioIsotere (26), the first free and comprehensive database of millions of molecular replacements systematically mined from literature (http://www.swissbioisostere.ch), SwissParam (27), which provides topology and parameters for the molecular modeling of small organic molecules (http://www.swissparam.ch), and SwissSidechain (28), a database gathering information about hundreds of commercially available non-natural amino acids that can be used for in silico peptide design (http://www.swisssidechain.ch).

Systems biology tools

The Iterative Signature Algorithm (ISA) (29) was designed to reduce the complexity of very large sets of data by decomposing them into so-called ‘modules’. In the context of gene expression data, these modules consist of subsets of genes that exhibit a coherent expression profile only over a subset of microarray experiments. Genes and arrays may be attributed to multiple modules and the level of required coherence can be varied resulting in different ‘resolutions’ of the modular mapping. ExpressionView (30) is an R package that provides an interactive environment to explore such modules in their biological context. The Ping-pong Algorithm (PPA) (31) extends modularization to multiple datasets from which it extracts ‘co-modules’. The latest software tool is QuickTest, which implements a number of statistical methods for the rapid association of measured and imputed genotype with phenotypes measured in large cohorts. The tools and ample documentation are available at http://www2.unil.ch/cbg/index.php?title=Software.

Evolution

Bgee (http://bgee.unil.ch/) (32) is a database to compare expression patterns between animal species. Bgee addresses difficulties such as complex anatomies and diverse sources of data by the use of ontologies and the explicit representation of homology. Homology relationships are defined both between genes and between anatomical features. The main efforts are the annotation of anatomical and developmental terms and their homology relationships, and the annotation and statistical treatment of transcriptome data. In 2013, RNA-Seq data have been added. The Bgee team has also been involved in the development of new resources to annotate and to compare data among any animal species. Bgee will thus be capable of integrating and analyzing the wealth of transcriptomics data being generated nowadays. OMA (http://omabrowser.org) (33) provides orthology predictions among publicly available proteomes from all domains of life. Started in 2004, it has undergone 16 releases and now elucidates orthology among 7.94 million genes from 1613 species, making it one of the largest resources of its kind. The resource includes a web interface (‘OMA Browser’), DAS and SOAP programmatic interfaces, and downloadable data and meta-data in various standard formats. Recently, OMA also provides an efficient stand-alone version that makes it easy to combine custom user data with pre-existing reference genomes (http://omabrowser.org/standalone). OrthoDB (http://orthodb.org) (34) provides the hierarchical catalog of orthologs across vertebrates, arthropods, fungi, basal metazoans and bacteria. Since orthology refers to the last common ancestor, OrthoDB explicitly delineates orthologs at different radiations along the species phylogeny. Functional annotations are provided through InterPro (https://www.ebi.ac.uk/interpro/), GO, OMIM(http://www.ncbi.nlm.nih.gov/omim/) and model organism phenotypes. Uniquely, OrthoDB provides computed evolutionary traits of orthologs, such as gene duplicability and loss profiles, divergence rates, sibling groups and exon–intron architectures. Now we also provide BUSCOs (Benchmarking sets of Universal Single-Copy Orthologs) for quality assessment of genome assemblies and annotation.

Biostatistics services

SIB has two core facilities in universities that provide special bioinformatics and biostatistics services: BCF (Bioinformatics Core Facility, http://bcf.isb-sib.ch) has competence and activities at the interface between biomedical sciences, statistics and computation. The BCF is a partner in several national and international trans-disciplinary research groups, and its bioinformatics know-how helps in the application of genomics technologies for discoveries in medical research leads (35–37). BBCF (EPFL Bioinformatics and Biostatistics Core Facility, http://bbcf.epfl.ch) provides support and consulting with regard to data management and statistical data analysis in genomics and genetics. Innovative tools have been developed within collaborations between the core facility and local research groups, see http://bbcftools.epfl.ch (38,39).

IT infrastructure

Several SIB groups provide hardware (high-throughput computational clusters and storage systems) and bioinformatics software (web-based and command line tools) to local as well as international biomedical users. Additionally, they act as centers of excellence with respect to bioinformatics knowledge: [BC] (http://www.bc2.ch/center) supports the life science research community in Basel by providing high performance computing infrastructure, including software and databases, training and consulting in the field of computational biology and bioinformatics. Vital-IT (http://www.vital-it.ch) is an innovative life and medical science informatics competency center providing computational resources, consultancy and training to connect fundamental and applied research. It operates a distributed computing infrastructure for life and medical science users. It serves many of the SIB resources for the national and international community (from webservers to APIs and innovative technology such as UniProt RDF services).

Training

SIB coordinates and provides training on different bioinformatics-related domains to the Swiss and international communities indistinctively. Current and new bioinformatics techniques, computational biology methods, statistical and NGS analysis and training on SIB resources are a few of the topics proposed in our portfolio. Most SIB courses are face-to-face, with an emphasis on practical learning, but combining different learning techniques has been tested recently. For instance, an e-learning module on ‘Unix fundamentals’ developed in-house is also a pre-requirement for the on-site course on high performance computing. SIB also maintains the SIB PhD Training Network to foster the interactions and exchange of ideas among PhD students, and to train them in the most up-to-date methods necessary for their doctoral research. To outreach and explain the role of bioinformatics to a larger audience, SIB has created the ChromosomeWalk.ch, a virtual exhibition on the human genome and bioinformatics. It is available in French, English, and most recently German. A complete list of training and outreach activities at SIB is available at http://www.isb-sib.ch/training.html.

CONCLUSION

Today, SIB acts as a model organization in Europe to build a sustainable European infrastructure via ELIXIR. The institute includes the leading bioinformatics groups of Switzerland and pioneered bioinformatics research and development in Switzerland and beyond. Several new biomedical applications are currently being developed, and SIB is further extending its scope (e.g. clinical bioinformatics) to advance biomedical knowledge and ultimately to contribute to public health directly.

38 in total

1. PROSITE: a documented database using patterns and profiles as motif descriptors.

Authors: Christian J A Sigrist; Lorenzo Cerutti; Nicolas Hulo; Alexandre Gattiker; Laurent Falquet; Marco Pagni; Amos Bairoch; Philipp Bucher
Journal: Brief Bioinform Date: 2002-09 Impact factor: 11.622

2. The SWISS-PROT protein sequence data bank.

Authors: A Bairoch; B Boeckmann
Journal: Nucleic Acids Res Date: 1991-04-25 Impact factor: 16.971

3. The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling.

Authors: Konstantin Arnold; Lorenza Bordoli; Jürgen Kopp; Torsten Schwede
Journal: Bioinformatics Date: 2005-11-13 Impact factor: 6.937

4. MSight: an image analysis software for liquid chromatography-mass spectrometry.

Authors: Patricia M Palagi; Daniel Walther; Manfredo Quadroni; Sébastien Catherinet; Jennifer Burgess; Catherine G Zimmermann-Ivol; Jean-Charles Sanchez; Pierre-Alain Binz; Denis F Hochstrasser; Ron D Appel
Journal: Proteomics Date: 2005-06 Impact factor: 3.984

5. A modular approach for integrative analysis of large-scale gene-expression and drug-response data.

Authors: Zoltán Kutalik; Jacques S Beckmann; Sven Bergmann
Journal: Nat Biotechnol Date: 2008-05 Impact factor: 54.908

6. The SWISS-2DPAGE database: what has changed during the last year.

Authors: C Hoogland; J C Sanchez; L Tonella; A Bairoch; D F Hochstrasser; R D Appel
Journal: Nucleic Acids Res Date: 1999-01-01 Impact factor: 16.971

7. Melanie II--a third-generation software package for analysis of two-dimensional electrophoresis images: I. Features and user interface.

Authors: R D Appel; P M Palagi; D Walther; J R Vargas; J C Sanchez; F Ravier; C Pasquali; D F Hochstrasser
Journal: Electrophoresis Date: 1997-12 Impact factor: 3.535

8. Iterative signature algorithm for the analysis of large-scale gene expression data.

Authors: Sven Bergmann; Jan Ihmels; Naama Barkai
Journal: Phys Rev E Stat Nonlin Soft Matter Phys Date: 2003-03-11

9. Inference of miRNA targets using evolutionary conservation and pathway analysis.

Authors: Dimos Gaidatzis; Erik van Nimwegen; Jean Hausser; Mihaela Zavolan
Journal: BMC Bioinformatics Date: 2007-03-01 Impact factor: 3.169

10. SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information.

Authors: Marco Biasini; Stefan Bienert; Andrew Waterhouse; Konstantin Arnold; Gabriel Studer; Tobias Schmidt; Florian Kiefer; Tiziano Gallo Cassarino; Martino Bertoni; Lorenza Bordoli; Torsten Schwede
Journal: Nucleic Acids Res Date: 2014-04-29 Impact factor: 16.971

7 in total

1. TFClass: a classification of human transcription factors and their rodent orthologs.

Authors: Edgar Wingender; Torsten Schoeps; Martin Haubrock; Jürgen Dönitz
Journal: Nucleic Acids Res Date: 2014-10-31 Impact factor: 19.160

2. ProteomeScout: a repository and analysis resource for post-translational modifications and proteins.

Authors: Matthew K Matlock; Alex S Holehouse; Kristen M Naegle
Journal: Nucleic Acids Res Date: 2014-11-20 Impact factor: 16.971

3. Deep phylogenomics of a tandem-repeat galectin regulating appendicular skeletal pattern formation.

Authors: Ramray Bhat; Mahul Chakraborty; Tilmann Glimm; Thomas A Stewart; Stuart A Newman
Journal: BMC Evol Biol Date: 2016-08-18 Impact factor: 3.260

4. A generally applicable lightweight method for calculating a value structure for tools and services in bioinformatics infrastructure projects.

Authors: Gerhard Mayer; Christian Quast; Janine Felden; Matthias Lange; Manuel Prinz; Alfred Pühler; Chris Lawerenz; Uwe Scholz; Frank Oliver Glöckner; Wolfgang Müller; Katrin Marcus; Martin Eisenacher
Journal: Brief Bioinform Date: 2019-07-19 Impact factor: 11.622