| Literature DB >> 34850820 |
Chiara Gabella1, Severine Duvaud1, Christine Durinx1.
Abstract
Data resources are essential for the long-term preservation of scientific data and the reproducibility of science. The SIB Swiss Institute of Bioinformatics provides the life science community with a portfolio of openly accessible, high-quality databases and software platforms, which vary from expert-curated knowledgebases, such as UniProtKB/Swiss-Prot (part of the UniProt consortium) and STRING, to online platforms such as SWISS-MODEL and SwissDrugDesign. SIB's mission is to ensure that these resources are available in the long term, as long as their return on investment and their scientific impact are high. To this end, SIB provides its resources, in addition to stable financial support, with a range of high-quality, innovative services that are, to our knowledge, unique in the field. Through this first-class management framework with central services, such as user-centric consulting activities, legal support, open-science guidance, knowledge sharing and training efforts, SIB supports the promotion of excellence in resource development and operation. This review presents the ecosystem of data resources at SIB; the process used for the identification, evaluation and development of resources; and the support activities that SIB provides. A set of indicators has been put in place to select the resources and establish quality standards, reflecting their multifaceted nature and complexity. Through this paper, the reader will discover how SIB's leading tools and databases are fostered by the institute, leading them to be best-in-class resources able to tackle the burning matters that society faces from disease outbreaks and cancer to biodiversity and open science.Entities:
Keywords: User Experience (UX); best-in-class resources; bioinformatics; open science; research infrastructure; sustainability
Mesh:
Year: 2022 PMID: 34850820 PMCID: PMC8769900 DOI: 10.1093/bib/bbab478
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Figure 1Map of SIB partner institutions as of July 2021.
Figure 2The life cycle of a bioinformatics resource.
Figure 3SIB Resources and resources from SIB groups. How they are embedded in the global bioinformatics community (A) and the services that SIB offers to them (B).
SIB Resources as of 2021
| Name of SIB Resource | Type | Description | Highlights | |
|
| Portal for single-cell data analysis | Software tool | Web-based, collaborative portal aimed at democratizing single-cell omics data analyses. Provides a full modular single-cell ribonucleic acid (RNA)-seq analysis pipeline | Enables standardized analyses that can be run in minutes by any user without requiring significant computing power. Joined the SIB portfolio in 2021 |
|
| Gene expression expertise | Knowledgebase with expert curation and software tools | Gene expression data (including all types of transcriptomes), allowing retrieval and comparison of expression patterns between animals, humans, model organisms and diverse species of evolutionary or agronomical relevance | Only resource to provide homologous gene expression between species |
|
| Knowledge resource on cell lines | Knowledgebase with expert curation and software tools | Information on all cell lines used in biomedical research, including immortalized cell lines, naturally immortal cell lines (stem cell lines), finite life cell lines, vertebrate cell lines with an emphasis on human, mouse and rat cell lines and invertebrate (insect and tick) cell lines | An ELIXIR Core Data Resource. Cellosaurus contains 128 000 entries with about 50 different types of information items, ~21 000 literature references and cross-references to 93 resources. Joined the SIB portfolio in 2021 |
|
| Eukaryotic Promoter Database | Knowledgebase with expert curation and software tools | Quality-controlled information on experimentally defined promoters of higher organisms, as well as web-based tools for promoter analysis. | Over 180,000 downloadable promoters that can be analyzed over a web interface and viewed in the UCSC genome browser. |
|
| Zooming in on web-based glycoinformatics resources | Knowledgebase with expert curation and software tools | Centralized web-based glycoinformatics resources developed within an international network of glycoscientists. Aims of (i) popularizing the use of bioinformatics in glycobiology and (ii) emphasizing the relationship between glycobiology and protein-oriented bioinformatics resources | Completely redesigned with glycoscientists in 2021 to meet the growing needs of the community. Joined the SIB portfolio in 2021 |
|
| Human protein knowledgebase | Knowledgebase with expert curation and associated tools | Information on human proteins, such as function, involvement in diseases, messenger ribonucleic acid/protein expression, protein/protein interactions, post-translational modifications, protein variations and their phenotypic effects | High data coverage through integration of multiple sources. Advanced semantic search functionalities. Tools specifically designed for the proteomics community |
|
| Impact of pathogen genome data on science and public health | Software tool | Open-source project to harness the scientific and public health potential of pathogen genome data. Provides a continually updated view of publicly available data alongside powerful analytic and visualization tools for use by the community | Started as an influenza-specific project, then evolved to Ebola, Mers, Zika and SARS-CoV-2. Joined the SIB portfolio in 2021 |
|
| Expert-curated database on biochemical reactions | Knowledgebase with expert curation | Knowledgebase of chemical and transport reactions of biological interest and the standard for enzyme and transporter annotation in UniProtKB | An ELIXIR Core Data Resource. Rhea contains >13 000 reactions with ~12,000 unique compounds |
|
| Protein–protein interaction networks and functional enrichment analysis | Knowledgebase and software tool | Resource for known and predicted protein–protein interactions, including direct (physical) and indirect (functional) associations derived from various sources, such as genomic context, high-throughput experiments, (conserved) co-expression and the literature | An ELIXIR Core Data Resource. STRING networks cover >5000 different organisms, with >25 million high-confidence links between proteins |
|
| Widening access to computer-aided drug design | Software tools | Web-based computer-aided drug design tools ranging from molecular docking (SwissDock) to pharmacokinetics and druglikeness (SwissADME), through virtual screening (SwissSimilarity), lead optimization (SwissBioisostere) and target prediction of small molecules (SwissTargetPrediction) | Comprehensive and integrated web-based drug design environment |
|
| Protein structure homology-modeling | Software tool and repository | Automated protein structure homology-modeling platform for generating 3D models of a protein using a comparative approach and database of annotated models for key reference proteomes based on UniProtKB | Easy-to-use web-based platform processing >2 million model requests per year, providing model information for experts and non-specialists |
|
| One-stop shop for orthologs | Phylogenomic databases and software tools | Web portal of resources to infer orthologs, i.e. corresponding genes across different species, a key aspect to predicting gene function or reconstructing species trees. It includes OrthoDB, BUSCO as well as OMA and the Quest for Orthologs benchmark service | World-leading orthology and comparative genomic resources |
|
| Tools and data for regulatory genomics | Software tools and knowledgebases | Web portal for regulatory genomics, including genome-wide annotations of regulatory sites and motifs, the webserver ISMARA for automated inference of regulatory networks and CRUNCH for automated analysis of ChIP-seq data and REALPHY for reconstructing phylogenies from raw sequence data | ISMARA and Crunch web servers allow users to upload raw microarray, RNA-seq or ChIP-seq data to automatically infer the core regulatory networks acting in their system of interest |
|
| Protein knowledgebase | Knowledgebase with expert curation | Hundreds of thousands of protein descriptions, including function, domain structure, subcellular location, post-translational modifications and functionally characterized variants | Expert-curated part of UniProt, the most widely used protein information resource in the world, with >6 million pageviews per month. An ELIXIR Core Data Resource |
|
| A knowledge resource for lipids | Knowledgebase with expert curation | Information about known lipids, including knowledge of lipid structures, metabolism and interactions, providing a framework for the integration of lipid and lipidomics data with biological knowledge and models | Contains information on >590 000 lipid structures from >640 lipid classes |
|
| Viral genomics pipeline | Software tool | Pipeline integrating various open-source software packages for assessing viral genetic diversity from NGS data | Enabling reliable and comparable viral genomics and epidemiological studies and facilitating clinical diagnostics of viruses |
The six categories covering the 27 quantitative and qualitative indicators (more on [16])
| Category | Description | Number of indicators included | |
| I | Scientific focus and quality of science | Demonstrate high quality of data and metadata, respond to a clear scientific need and be unique. This implies benchmarking against other resources and being an authority in its field compared to the major competitors | Four |
| II | Community | Know the community to whom it is addressed, its size and usage: web statistics, user reach and community size. Candidate Resources with a valid track record of usage, responding to a clear need within the scientific community are more likely to be included as SIB Resources. However, emerging resources are encouraged to submit as well. The scientific context in which the resource operates should be taken into account. A resource that serves a small scientific community may not have as many users as a resource serving a broader interest, and yet, it may reach 90% of the community it supports (coverage) and be crucial for the scientific work of that community | Five |
| III | Quality of service | Demonstrate a high level of service and reliability with the integration of features, such as persistent and unique identifiers, community-recognized standards, user support and training and integration of user feedback | Five |
| IV | Legal and funding infrastructure and governance | Have a sound legal framework supporting open science and seek complementary funds from other sources in order to ensure sustainable long-term funding | Four |
| V | Impact and translational stories | Have a significant impact on the life science community and be impact-driven | Three |
| VI | SIB | Contribute to SIB in terms of scientific credibility and visibility and demonstrate alignment and synergies with the current portfolio of SIB Resources | Three |
Figure 4Change in the usage and impact of the data resources in the SIB portfolio. (A) Bottom: the bar chart represents the aggregated values of unique users per year (orange bars) and sessions per year (blue bars) across all SIB Resources (data from Google Analytics). Top: the blue line indicates the upward trend of cumulative unique users per month accessing SIB Resources. (B) Impact and usage in research of SIB data resources as measured through aggregated citations in the literature (source: Europe PMC). The datasets used to generate the graphs are available in GitHub: https://github.com/sib-swiss/managing-life-cycle-portfolio-sib-resources/blob/main/Resources-statistics.xlsx.
Figure 5Data resources in the SIB ecosystem are highly interconnected: chord diagram showing the data flows between the resources (the flow has the same color as the resource of origin). The image was produced using Circos circos.ca. The datasets used to generate the diagram are available in GitHub: https://github.com/sib-swiss/managing-life-cycle-portfolio-sib-resources/blob/main/matrices-data-flow.xlsx.
Figure 6The SIB Resource selection process. Stakeholders (A) and procedure (B).
Figure 7The services offered to assist SIB Resources in becoming and remaining best in class.