Literature DB >> 33553941

SARSCOVIDB-A New Platform for the Analysis of the Molecular Impact of SARS-CoV-2 Viral Infection.

Rafael Lopes da Rosa1, Tung Sheng Yang2, Emanuela Fernanda Tureta1,2, Laura Rascovetzki Saciloto de Oliveira2, Amanda Naiara Silva Moraes1, Juliana Miranda Tatara1, Renata Pereira Costa2, Júlia Spier Borges2, Camila Innocente Alves2, Markus Berger3, Jorge Almeida Guimarães3, Lucélia Santi1,2, Walter Orlando Beys-da-Silva1,2.   

Abstract

The COVID-19 pandemic caused by the new coronavirus (SARS-CoV-2) has become a global emergency issue for public health. This threat has led to an acceleration in related research and, consequently, an unprecedented volume of clinical and experimental data that include changes in gene expression resulting from infection. The SARS-CoV-2 infection database (SARSCOVIDB: https://sarscovidb.org/) was created to mitigate the difficulties related to this scenario. The SARSCOVIDB is an online platform that aims to integrate all differential gene expression data, at messenger RNA and protein levels, helping to speed up analysis and research on the molecular impact of COVID-19. The database can be searched from different experimental perspectives and presents all related information from published data, such as viral strains, hosts, methodological approaches (proteomics or transcriptomics), genes/proteins, and samples (clinical or experimental). All information was taken from 24 articles related to analyses of differential gene expression out of 5,554 COVID-19/SARS-CoV-2-related articles published so far. The database features 12,535 genes whose expression has been identified as altered due to SARS-CoV-2 infection. Thus, the SARSCOVIDB is a new resource to support the health workers and the scientific community in understanding the pathogenesis and molecular impact caused by SARS-CoV-2.
© 2021 The Authors. Published by American Chemical Society.

Entities:  

Year:  2021        PMID: 33553941      PMCID: PMC7839156          DOI: 10.1021/acsomega.0c05701

Source DB:  PubMed          Journal:  ACS Omega        ISSN: 2470-1343


Introduction

In December 2019, a new strain of coronavirus associated with severe acute respiratory syndrome was identified as SARS-CoV-2.[1] This is an encapsulated, single-stranded RNA virus from the Coronaviridae family, presenting high virulence and generating a significant global impact.[2] The COVID-19 pandemic caused by SARS-CoV-2 had as its epicenter the city of Wuhan, China, and in a short time became a serious public health problem worldwide.[3] As of end-October, it is present in more than 200 countries, accounting for 45 million cases.[4] In the USA, the country with the highest number of deaths, more than 8 million cases and 220,000 deaths have been recorded.[4] The same dramatic outcome of the pandemic can be noticed in Brazil, the country with the second highest number of COVID-19 deaths, with more than 5 million cases and 150,000 deaths.[5] The increasing proliferation of SARS-CoV-2-infection cases and the lack of specific therapeutics and vaccines have caused concern among public authorities and international agencies, resulting in the mobilization of the scientific community to understand the disease and clinical outcomes aiming to improve treatments and find ways to prevent COVID-19 cases. This current scenario has generated significant changes in the field, such as a rapid and unprecedented increase in the number of scientific articles being published. It was possible through facilitated submission processes and preprint publications, without peer evaluation, to disseminate articles with very speculative results, such as computational predictions.[6] Understanding of the molecular aspects associated with COVID-19 and the search for understanding the complex response of the host after viral infection have been gaining space.[7−9] Approaches that evaluate viral infections from the perspective of the molecular impact on the host contribute to the understanding of the disease, which is important for outlining potential antiviral strategies.[10] Analysis of differentially expressed genes (DEGs), which allow in silico characterization of the molecular response and the impact of infection, can be applied. Some databases contain a collection of expression data, but their results are mostly obtained by text mining with automatic and semiautomatic approaches, which may lead to nonaccurate data being deposited.[11] Furthermore, the results may present redundancy and ambiguity, and a postanalysis is necessary for data conference.[12] In the same way, the need for bioinformaticians to extract raw data in many of the available databases is another important limitation of its use; it can be especially limiting for medical workers without bioinformatics skills who are facing the COVID-19 impact in real time and need information promptly. This scenario hampers the access of the medical community and also other nonspecialized scientists who may need this information in their research. Recently, our group developed the ZIKAVID, a database based on gathering all up-to-date gene expression data generated after Zika virus (ZIKV) infection, containing different experimental approaches, hosts and strains, and other related information.[12] In this work, we present the SARS-CoV-2 virus infection database (SARSCOVIDB: https://sarscovidb.org/), a public database containing all DEGs identified in SARS-CoV-2 infection and COVID-19 samples, manually developed and checked, with a friendly interface that is easy to navigate. This database will help researchers worldwide, and general users, to speed up the research and understanding of the molecular impact of COVID-19 and possible clinical outcomes.

Results and Discussion

The outbreak of COVID-19 worldwide, linked to the lack of efficient treatments and approved vaccines, triggered a great effort by the scientific community and governments toward research involving SARS-CoV-2 and the potential comorbidities associated in humans.[13] For this reason, the SARSCOVIDB was created, comprising all data from DEGs identified after SARS-CoV-2 infection to date. The database was initially built by searching the specific terms “COVID” and “SARS-CoV-2”, with a manual double-check for differential expression of genes or proteins after SARS-CoV-2 infection, regardless of the host. To increase the search and the user interface, all data were categorized according to experimental approach, as described above, including the reference article. The SARSCOVIDB is a database to exclusively gather DEGs after SARS-CoV-2 infection and thus is an important resource to be explored by the scientific community and COVID-19 medical workers in this topic of urgent need. So far, the SARSCOVIDB contains 12,535 differential DEG entries and 9,283 unique genes. These data comprise different experimental approaches with distinct objectives, the majority being obtained from clinical samples (Figure A). Thus, users can easily consult the most frequently reported host models and compare specific gene sets and other information from all published studies with differential expression after SARS-CoV-2 infection. This can facilitate the planning of new experiments, accelerating understanding of the data available and contributing to accelerate understanding of the molecular impact of COVID-19. These expressive numbers in such a short time span of only a few months also highlight the great efforts made by the scientific community to study SARS-CoV-2 infection and COVID-19. The ZIKAVID covered the same kind of data but on ZIKV; however, this generated a smaller number of clinical samples of ZIKV-infected patients,[12] even though the epidemic also reached a global threat status[14] and caused great concern worldwide in 2015–2017.
Figure 1

SARSCOVID summary of contents. (A) Number of DEGs identified in clinical and experimental samples; (B) number of articles containing clinical samples; (C) number of articles containing experimental samples; (D) number of articles according to the reported origin of SARS-CoV-2 strain; (E) number of articles according to the methodological approach.

SARSCOVID summary of contents. (A) Number of DEGs identified in clinical and experimental samples; (B) number of articles containing clinical samples; (C) number of articles containing experimental samples; (D) number of articles according to the reported origin of SARS-CoV-2 strain; (E) number of articles according to the methodological approach. Regarding the samples used, most are peripheral blood samples (five serum, four blood, and two plasma samples) and one lung sample (Figure B). On the other hand, the experimental samples used different cell lines, most of them from humans (Figure C). As observed, many scientific data have been generated using cell lines and other models but there are still no robust animal models that faithfully replicate the pathogenesis of SARS-CoV-2.[15] Thus, the lack of comprehensive studies comparing different hosts and their responses to infection made the survey of data gathered in the SARSCOVIDB an important alternative to planning future experiments regarding the choice of a more meaningful experimental model. The SARSCOVIDB comprises data from SARS-CoV-2 isolates from different geographic regions (Figure D). However, studies have still given little importance to the impact that the mutations suffered by strains in different regions may have on disease dynamics and virulence, as occurred with ZIKV for instance.[16] Interestingly, almost 50% of the SARS-CoV-2 isolates or clinical samples studied are from China, followed by America (around 16%) and Europe (around 16%). It is important to highlight that depending on the origin, viral isolates of SARS-CoV-2 can lead to different pathological impacts and mortality rates, as previously suggested.[17] The most recent example of a virus that had a differential clinical impact, depending on the origin, was ZIKV, where Brazilian isolates were strongly associated with severe neurological data.[18] Thus, the SARSCOVIDB contributes to the study of the pathology on the origin of the virus, once it is possible to promptly cross-reference expression data in similar models using viruses from different sources. Currently, there are various databases gathering genomic, transcriptomic, and proteomic data on viruses and their impact on the host.[19,20] Several databases were developed to obtain host–pathogen interaction data (gene expression and protein interaction), such as VirHostNet 2.0,[21] Virhostome,[22] the HIV-1 human interaction database,[23] the Gene Expression Omnibus,[24] and the Virus Pathogen Resource (ViPR).[19] On SARS-CoV-2, there are already two databases: Coronascape,[25] available at Metascape database, and The COVID-19 Drug and Gene Set Library,[26] which contain a collection of drug and gene sets related to COVID-19. Although not the first SARS-CoV-2 database, SARSCOVIDB is certainly the first to devote itself exclusively to this specific focus on differential expression after the SARS-CoV-2 infection. Most of the data generated by other databases are automated by concentrating generic information, which can result in inaccurate information.[27] The SARSCOVIDB proposes to fill this gap in a simple, organized, and objective way by curating the data manually, making it reliable. Furthermore, the user, who does not need a bioinformatic background or skills, can query data from different experimental perspectives, such as the methodology used, host, virus strain, and so forth.

Conclusions

The SARSCOVIDB is the first database to gather all data to date from differential expression analyses after SARS-CoV-2 infection, with manual checking of the data to give more accurate and faster analysis. Users do not need to have a background in bioinformatics because the user-friendly and simple interface enables all search possibilities to be explored. This allows the user to cross-reference data for the best understanding and analysis of the deposited data. Finally, the SARSCOVIDB is a promising tool for supporting scientists and medical professionals carrying out research and analysis on the molecular mechanisms of SARS-CoV-2 infection.

Materials and Methods

Original Article Selection

The SARSCOVIDB (available at: https://sarscovidb.org/) comprises differential gene expression measurements, at mRNA and protein levels, built through four main steps (Figure ). The first step was a manual search by manual text mining to find all articles available in PubMed, Web of Science, Google Scholar, and ScienceDirect databases containing the terms “SARS-CoV-2” and “COVID-19”. Only accepted/published manuscripts were used as the source of DEGs in SARS-CoV2 infection. This search retrieved 5554 articles, from December 2019 to date. The second step comprised a manual check of abstracts to select articles containing only differential gene expression measurements after SARS-CoV-2 infection. All references were double-checked by two independent individuals, resulting in the selection of 24 articles.
Figure 2

Database construction steps.

Database construction steps.

Data Collection and Related Information

The data collected from selected papers were checked and organized, comprising a list of DEGs at mRNA and/or protein level identified after SARS-CoV-2 infection. Other information was also collected, such as the type of study (in vivo, in vitro, and clinical), methodological approach (transcriptomic, proteomic, qRT-PCR, and immunoblotting), viral strain, hosts (clinical or animal model and cell culture), and expression status (see Table ). The SARSCOVIDB also contains information from data deposited in a repository or database. The database will be updated at least monthly by considering the availability of new data from published articles.
Table 1

General Information Stored in the SARSCOVIDB

SARSCOVIDB IDreference identification of the entry in the SARSCOVIDB.
gene nameofficial gene name for each stored entry.
protein nameofficial protein name for each stored entry.
host/samplehost model used to generate SARS-COV-2 infection data.
sampledescription of the sample, such as if it is a human clinical sample (blood, urine, serum, etc.) or experimental sample (cell line or animal tissue tested in vitro).
virus referenceSARS-CoV-2 strain reference or geographic origin of the SARS-CoV-2 viral isolate or clinical sample.
methodsmethods used to measure gene expression at the level of RNA and/or protein after SARS-CoV-2 infection.
expressionindicates the identified gene expression status (up- and/or downregulated) in response to SARS-CoV-2 infection.
articlearticle title used as a data source.
DOIdirect link to the original reference where the data came from.
type of sampleindicates if the data were generated using clinical or experimental samples.

Webpage Construction and User Interface

The last step in the SARSCOVIDB was webpage development (Figure ). MySQL v5.0, PHP v.5.2.99, and HTML10 were used to build the database and the graphical user interface. The data were stored and maintained by the relational database management system (RDBMS) server, with the SQL language used for data management. The search can be customized easily by the user, combining specific proteins with hosts or selecting a specific viral strain or geographic origin. The database also provides step-by-step links to instruct the user how to search. A list containing all genes deposited in the SARSCOVIDB is available through direct download (https://sarscovidb.org/download/).
  20 in total

Review 1.  Protein-protein interaction predictions using text mining methods.

Authors:  Nikolas Papanikolaou; Georgios A Pavlopoulos; Theodosios Theodosiou; Ioannis Iliopoulos
Journal:  Methods       Date:  2014-10-28       Impact factor: 3.608

2.  Zika Virus Infection of Human Mesenchymal Stem Cells Promotes Differential Expression of Proteins Linked to Several Neurological Diseases.

Authors:  Walter O Beys-da-Silva; Rafael L Rosa; Lucélia Santi; Markus Berger; Sung Kyu Park; Alexandre R Campos; Paula Terraciano; Ana Paula M Varela; Thais F Teixeira; Paulo M Roehe; André Quincozes-Santos; John R Yates; Diogo O Souza; Elizabeth O Cirne-Lima; Jorge A Guimarães
Journal:  Mol Neurobiol       Date:  2018-10-30       Impact factor: 5.590

3.  Interpreting cancer genomes using systematic host network perturbations by tumour virus proteins.

Authors:  Orit Rozenblatt-Rosen; Rahul C Deo; Megha Padi; Guillaume Adelmant; Michael A Calderwood; Thomas Rolland; Miranda Grace; Amélie Dricot; Manor Askenazi; Maria Tavares; Samuel J Pevzner; Fieda Abderazzaq; Danielle Byrdsong; Anne-Ruxandra Carvunis; Alyce A Chen; Jingwei Cheng; Mick Correll; Melissa Duarte; Changyu Fan; Mariet C Feltkamp; Scott B Ficarro; Rachel Franchi; Brijesh K Garg; Natali Gulbahce; Tong Hao; Amy M Holthaus; Robert James; Anna Korkhin; Larisa Litovchick; Jessica C Mar; Theodore R Pak; Sabrina Rabello; Renee Rubio; Yun Shen; Saurav Singh; Jennifer M Spangle; Murat Tasan; Shelly Wanamaker; James T Webber; Jennifer Roecklein-Canfield; Eric Johannsen; Albert-László Barabási; Rameen Beroukhim; Elliott Kieff; Michael E Cusick; David E Hill; Karl Münger; Jarrod A Marto; John Quackenbush; Frederick P Roth; James A DeCaprio; Marc Vidal
Journal:  Nature       Date:  2012-07-26       Impact factor: 49.962

4.  VirHostNet 2.0: surfing on the web of virus/host molecular interactions data.

Authors:  Thibaut Guirimand; Stéphane Delmotte; Vincent Navratil
Journal:  Nucleic Acids Res       Date:  2014-11-11       Impact factor: 16.971

Review 5.  Text mining resources for the life sciences.

Authors:  Piotr Przybyła; Matthew Shardlow; Sophie Aubin; Robert Bossy; Richard Eckart de Castilho; Stelios Piperidis; John McNaught; Sophia Ananiadou
Journal:  Database (Oxford)       Date:  2016-11-25       Impact factor: 3.451

6.  Unprecedented surge in publications related to COVID-19 in the first three months of pandemic: A bibliometric analytic report.

Authors:  Srinivas B S Kambhampati; Raju Vaishya; Abhishek Vaish
Journal:  J Clin Orthop Trauma       Date:  2020-05-13

7.  Metascape provides a biologist-oriented resource for the analysis of systems-level datasets.

Authors:  Yingyao Zhou; Bin Zhou; Lars Pache; Max Chang; Alireza Hadj Khodabakhshi; Olga Tanaseichuk; Christopher Benner; Sumit K Chanda
Journal:  Nat Commun       Date:  2019-04-03       Impact factor: 14.919

8.  Downregulated Gene Expression Spectrum and Immune Responses Changed During the Disease Progression in Patients With COVID-19.

Authors:  Yabo Ouyang; Jiming Yin; Wenjing Wang; Hongbo Shi; Ying Shi; Bin Xu; Luxin Qiao; Yingmei Feng; Lijun Pang; Feili Wei; Xianghua Guo; Ronghua Jin; Dexi Chen
Journal:  Clin Infect Dis       Date:  2020-11-19       Impact factor: 9.079

9.  Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation.

Authors:  Daniel Wrapp; Nianshuang Wang; Kizzmekia S Corbett; Jory A Goldsmith; Ching-Lin Hsieh; Olubukola Abiona; Barney S Graham; Jason S McLellan
Journal:  Science       Date:  2020-02-19       Impact factor: 47.728

Review 10.  A Review on SARS-CoV-2 Virology, Pathophysiology, Animal Models, and Anti-Viral Interventions.

Authors:  Sabari Nath Neerukonda; Upendra Katneni
Journal:  Pathogens       Date:  2020-05-29
View more
  1 in total

Review 1.  Transcriptomics and RNA-Based Therapeutics as Potential Approaches to Manage SARS-CoV-2 Infection.

Authors:  Cristian Arriaga-Canon; Laura Contreras-Espinosa; Rosa Rebollar-Vega; Rogelio Montiel-Manríquez; Alberto Cedro-Tanda; José Antonio García-Gordillo; Rosa María Álvarez-Gómez; Francisco Jiménez-Trejo; Clementina Castro-Hernández; Luis A Herrera
Journal:  Int J Mol Sci       Date:  2022-09-21       Impact factor: 6.208

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.