Literature DB >> 34634820

SCovid: single-cell atlases for exposing molecular characteristics of COVID-19 across 10 human tissues.

Changlu Qi¹, Chao Wang¹, Lingling Zhao², Zijun Zhu¹, Ping Wang¹, Sainan Zhang¹, Liang Cheng^1,3, Xue Zhang^3,4.

Abstract

SCovid (http://bio-annotation.cn/scovid) aims at providing a comprehensive resource of single-cell data for exposing molecular characteristics of coronavirus disease 2019 (COVID-19) across 10 human tissues. COVID-19, an epidemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has been found to be accompanied with multiple-organ failure since its first report in Dec 2019. To reveal tissue-specific molecular characteristics, researches regarding to COVID-19 have been carried out widely, especially at single-cell resolution. However, these researches are still relatively independent and scattered, limiting the comprehensive understanding of the impact of virus on diverse tissues. To this end, we developed a single-cell atlas of COVID-19. Firstly we collected 21 single-cell datasets of COVID-19 across 10 human tissues paired with control datasets. Then we constructed a pipeline for the analysis of these datasets to reveal molecular characteristics of COVID-19 based on manually annotated cell types. The current version of SCovid documents 1 042 227 single cells of 21 single-cell datasets across 10 human tissues, 11 713 stably expressed genes and 3778 significant differentially expressed genes (DEGs). SCovid provides a user-friendly interface for browsing, searching, visualizing and downloading all detailed information.

Entities: Chemical

Mesh：

Year: 2022 PMID： 34634820 PMCID： PMC8524591 DOI： 10.1093/nar/gkab881

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is an ongoing global health threat since the beginning of the outbreak in late 2019 and has infected more than 190 million people worldwide as of 21 July 2021 (1). Research on isolating, sequencing and cloning the virus, development of diagnostic kits, and the testing of candidate vaccines are rapidly proceeding (2–6). However, key questions remain about the pathophysiology of COVID-19 (7). With the in-depth case studies of COVID-19, accumulating evidence indicates that COVID-19 could not only result in acute respiratory distress syndrome but also multiorgan involvement. SARS-CoV-2 binds to angiotensin converting enzyme 2 (ACE2) receptors presented in vascular endothelial cells, lungs, heart, brain, kidneys, intestine, liver, and other tissues, which directly injures these organs (8). For example, emerging data from autopsy studies demonstrated that COVID-19 is accompanied by acute interstitial pneumonia (AIP), diffuse alveolar damage (DAD) and microvasculature involvement with pulmonary vessel hyaline thrombosis, haemorrhage, vessel wall oedema, intravascular neutrophil trapping and immune cell infiltration (9–11). In addition, gastrointestinal symptoms associated with COVID-19 vary widely but can include loss of appetite, nausea, vomiting, diarrhoea and generalized abdominal pain (12). ACE2 expression in cardiac tissue is also significantly elevated, which may potentially facilitate myocarditis caused by viral infection (13–15). To reveal tissue-specific molecular characteristics, researches regarding to COVID-19 have been carried out widely, especially at single-cell resolution. Triana et al. identified a subgroup of enterocytes as the prime target of SARS-CoV-2 and found the lack of positive correlation between infection susceptibility and ACE2 expression using single-cell RNA sequencing of SARS-CoV-2-infected colon and ileum organoids, which indicates that SARS-CoV-2 suppresses the immune response (16). Moreover, Arunachalam et al. revealed that various cell types exhibit unique pro- and anti-inflammatory responses by analyzing the peripheral blood mononuclear cells (PBMCs) of COVID-19 patients (17). Since the rapid development of COVID-19 has led to the imminent researches on COVID-19, numerous COVID-19-related databases have emerged. GISAID (18), Nextstrain (19), GESS (20) and European Nucleotide Archive (21) collected SARS-CoV-2 strains of different patients all around the world and provided tools to analyse sequences. CORD-19 (22), LitCovid (23) and BioRxiv & MedRxiv summarized the literature about the latest progress in COVID-19 research. DrugBank (24), DockCoV2 (25) and COVID19 Drug Repository (26) predicted drugs with potential therapeutic effects and were well cross-linked to external databases, which provided the possibility to speed up the discovery of therapeutic drugs. Coronavirus3D (27), CoV3D (28) and RCSB PDB (29) annotated and visualized structures of coronavirus proteins and their complexes with high resolution. Besides, various types of single-cell databases such as CancerSEA (30), CellMaker (31), TISCH (32) and so on are emerging in endlessly. However, none of these databases focuses on molecular characteristics of COVID-19 patients. Therefore, we developed SCovid, a single-cell atlas for exposing molecular characteristics of COVID-19 across 10 human tissues. This database could be freely available at: http://bio-annotation.cn/scovid.

DATA COLLECTION AND DATABASE CONTENT

We manually searched COVID-19 related single cell RNA-seq (scRNA-seq) datasets in electronic databases, including Sequence Read Archive (SRA) (33) and Gene Expression Omnibus (GEO) (34), based on the keywords: (‘COVID-19’ OR ‘SARS-CoV-2’) AND (‘single cell’ OR ‘single-cell’) AND (‘transcriptomics’ OR ‘transcriptome’ OR ‘scRNA-seq’ OR ‘scRNA seq’). Meanwhile, we also systematically searched electronic databases, including PubMed, National Library of Medicine of the National Institutes of Health, BioRxiv and MedRxiv preprint services operated by Cold Spring Harbor Laboratory, through searching for the keywords such as ‘single cell sequencing’, ‘scRNA-seq & COVID-19’ and ‘transcriptomics & COVID-19’. Literature and host data on COVID-19 were manually extracted from publications. Finally, a total of 21 COVID-19 related scRNA-seq datasets involving 10 tissue types were obtained. All datasets were collected before July 2021. Considering the technical noise of assay, we removed low quality cells and lowly expressed genes of each COVID-19 related scRNA-seq datasets for further analysis, using the following strategy: (i) cells that had fewer than 200 genes, as well as genes expressed in fewer than three cells; (ii) liver cells that contained greater than 50% of mitochondrial genes, as well as other tissue cells that contained >20% of mitochondrial genes. For each dataset, we used the R package ‘Seurat’ (v3.2.3) (35) for data integrating, clustering, dimensionality reduction, and visualization. For these analyses, the function ‘SCTransform’ was used to integrate and scale data. Then, PCA analysis was performed using variable feature genes, and the principal components (PCs) identified by the function ‘ElbowPlot’ were used to cluster the dataset. Next, each cluster annotation was confirmed by our previous knowledge of known cell type-specific gene markers, which were obtained from DE genes of each cluster by ‘FindMarkers’ function. Subsequently, we performed UMAP to reduce the dataset into two-dimension, and finally the cells were visualized on the website. We also performed analysis of scRNA-seq expression, including DE genes and gene pathway. First, for each cell type, MAST (v1.16.0) (36) was used to calculated differentially expressed genes (DEGs) between the cells from samples with COVID-19 and the cells from controls. Then, up/down-regulated genes with top 5% |Log2FC| and P value <0.05 were regarded as significant DEGs, which were visualized by volcano plot. Next, the GO pathways of each cell type were enriched using these up/down-regulated significant DEGs by R package clusterProfiler (37). Overview of SCovid database is shown in Figure 1. The current version of SCovid documents 1 042 227 single cells of 21 single-cell datasets across 10 human tissues (including intestine, blood, pancreas, lung, brain, airway, heart, kidney, liver and lymph node), 11 713 stably expressed genes (217 495 associations) and 3778 significant DEGs (8898 associations). Each dataset in SCovid contains detailed information of data source, sample source, grouping information, single-cell number and cell types. Each entry of DEGs contains Log2FC, P value and visual information. Figure 2 shows the number of genes in each dataset. Figure 3 shows the most frequently occurred significant DEGs that might be potential cell-type specific markers in these 21 datasets.

Figure 1.

Overview of SCovid database.

Figure 2.

Number of genes in each dataset. Each color represents a different cell type. (A) Number of significant DEGs in each dataset. (B) Number of stably expressed genes in each dataset.

Figure 3.

The most frequently occurring significant DEGs in cell types of these 21 datasets. The area of the sector represents the proportion of cell types where this gene is significant DEG in the dataset.

Overview of SCovid database. Number of genes in each dataset. Each color represents a different cell type. (A) Number of significant DEGs in each dataset. (B) Number of stably expressed genes in each dataset. The most frequently occurring significant DEGs in cell types of these 21 datasets. The area of the sector represents the proportion of cell types where this gene is significant DEG in the dataset.

USER INTERFACE

We provided a user-friendly web interface to visualize the datasets by a few flexible steps as shown in Figures 4 and 5. All datasets are organized according to tissues types. Users can browse datasets by clicking the corresponding tissue icon or ‘Tissue’ hyperlinks in the ‘Home’ page or clicking specific tissue name in the navigation menu in the ‘Browse’ page (Figure 4A and B). After selecting a dataset, for example, ‘Delorey TM. (Liver)’, all the detailed and visual information, including ‘Detailed description’, ‘UMAP’, ‘Cell proportion’, ‘DEGs in cell types’ and ‘Expression profile’, would be retrieved.

Figure 4.

Browse page and results of SCovid. (A) Home page of Scovid. (B) The tree browser of SCovid in Browse page. (C) Detailed description of this dataset. (D) Two-dimensional UMAP plot. The colors of points represent the cell types which cells belong to. (E) Cell proportion plot that displays the proportion of each cell types per sample in the selected dataset. (F) The heatmap that shows the expression profile of high-variance genes in different cell types. (G) The volcano plot that shows the statistically significant DEGs between COVID-19 and control and GO enrichment bar plots of up/down-regulated. In the GO enrichment bar plots, the vertical axis shows the names of clusters of GO terms, and the horizontal axis displays the − Log10 (P value). A P value <0.05 was used as a threshold to select significant GO terms. (H) The table that shows statistically significant DEGs between COVID-19 and control. (I) The violin plot of a specific gene in COVID-19 and control and UMAP projection for a specific gene.

Figure 5.

Search page and results of SCovid. (A) Search page of SCovid. (B) The table that shows statistically significant DEGs between COVID-19 and control in different tissues. (C) The violin plot of a specific gene in COVID-19 and control and UMAP projection for a specific gene. (D) The table that shows statistically significant DEGs between COVID-19 and control. (E) The volcano plot that shows the statistically significant DEGs between COVID-19 and control and GO enrichment bar plots of up/down-regulated. In the GO enrichment bar plots, the vertical axis shows the names of clusters of GO terms, and the horizontal axis displays the − Log10 (P value). A P value <0.05 was used as a threshold to select significant GO terms.

Detailed description. The ‘Detailed description’ section contains dataset name, tissue type, accession number, number of cells, cell types, sample source and relevant publication information (Figure 4C). Additionally, accession number and publication title contain hyperlinks the clients can follow. UMAP. Visualization of the selected dataset using UMAP analysis is displayed in the ‘UMAP’ section with colorful points representing different cell types (Figure 4D). Cell proportion. The ‘Cell proportion’ section displays a bar plot to show the cell-type proportion across samples (Figure 4E). Each bar represents a sample and different colors represent different cell types. DEGs in cell types. In the ‘DEGs in cell types’ section, users can select the interested cell type to browse the interactive information including a volcano plot, a table and Gene ontology (GO) (38) enrichment bar plots (Figure 4G and H). When positioning the mouse on any bubbles of the volcano plot showing all stably expressed genes, the detailed information including gene symbol, Log2FC, P value and change status would be popped up. The result table is used to display the statistically significant DEGs between COVID-19 and control in the selected cell type of this dataset. In the result table, clicking the ‘detail’ link of a row would lead to the detailed plots including a violin plot and a UMAP projection plot for the specific gene (Figure 4I). The GO enrichment bar plots displaying GO classifications of up/down-regulated genes, in which hovering over any bars would pop up detailed information including ontology aspect, term ID, term description, P value and genes’ symbol. Expression profile. The ‘Expression profile’ section provides the heatmap that shows the expression profile of high-variance genes in different cell types (Figure 4F). The individual tiles in the heatmap are scaled with a range of colors proportionate to gene expression values. The gene sequences correspond to the rows of the matrix and the cells correspond to the columns. Data search. In the ‘Search’ page, SCovid offers two sections involving ‘Search DEG in all tissues’ and ‘Search cell type’ (Figure 5A). For a gene, SCovid allows users to input its symbol to query its related DEG information in all tissues and cell types and a table will be returned as described above on the Browse page (Figure 5B and C). By selecting cell type, users will query its detailed DEGs and enriched GO terms in a tissue based on one dataset (Figure 5D and E). Data download. In addition, all data in SCovid can be downloaded in the ‘Download’ page, containing the DEGs' expression profile, variation information of all stably expressed genes and DEGs. Browse page and results of SCovid. (A) Home page of Scovid. (B) The tree browser of SCovid in Browse page. (C) Detailed description of this dataset. (D) Two-dimensional UMAP plot. The colors of points represent the cell types which cells belong to. (E) Cell proportion plot that displays the proportion of each cell types per sample in the selected dataset. (F) The heatmap that shows the expression profile of high-variance genes in different cell types. (G) The volcano plot that shows the statistically significant DEGs between COVID-19 and control and GO enrichment bar plots of up/down-regulated. In the GO enrichment bar plots, the vertical axis shows the names of clusters of GO terms, and the horizontal axis displays the − Log10 (P value). A P value <0.05 was used as a threshold to select significant GO terms. (H) The table that shows statistically significant DEGs between COVID-19 and control. (I) The violin plot of a specific gene in COVID-19 and control and UMAP projection for a specific gene. Search page and results of SCovid. (A) Search page of SCovid. (B) The table that shows statistically significant DEGs between COVID-19 and control in different tissues. (C) The violin plot of a specific gene in COVID-19 and control and UMAP projection for a specific gene. (D) The table that shows statistically significant DEGs between COVID-19 and control. (E) The volcano plot that shows the statistically significant DEGs between COVID-19 and control and GO enrichment bar plots of up/down-regulated. In the GO enrichment bar plots, the vertical axis shows the names of clusters of GO terms, and the horizontal axis displays the − Log10 (P value). A P value <0.05 was used as a threshold to select significant GO terms.

SUMMARY AND FUTURE PERSPECTIVES

Since the outbreak of COVID-19 in Dec. 2019, databases about the literature collection, SARS-CoV-2 genome sequencing or proteins’ structures, and drug prediction appeared subsequently, while none of them focuses on molecular characteristics of COVID-19 patients. Given the high accuracy and cellular specificity of single-cell sequencing, we collected 21 single-cell datasets of COVID-19 across 10 human tissues paired with control datasets to reveal molecular characteristics of COVID-19 based on manually annotated cell types. We further developed a database system SCovid to provide a user-friendly interface for browsing, searching, visualizing and downloading stably expressed genes, significant DEGs and functional analysis of these significant DEGs based on cell types across tissues. The current version of SCovid documents 1 042 227 single cells of 21 single-cell datasets across 10 human tissues, 11 713 stably expressed genes and 3778 significant DEGs. Each dataset in the SCovid contains detailed information of data source, sample source, grouping information, single-cell number and cell types. Each entry of DEGs contains Log2FC, P value and visual information. SCovid is a powerful and high-quality database for molecular characteristics of COVID-19. Biologist can access the variation information of genes of interest on specific cell types of different tissues, and the enrichment pathways of differential genes on specific cell types of different tissues. Bioinformatician can use machine learning methods to predict tissue-specific driver genes and therapeutic drugs of COVID-19. Although there is limited single-cell data of COVID-19 currently, research on COVID-19 will increase largely, since there is no effective way to completely inhibit the spread of the virus now. Meanwhile, the research focus has gradually shifted from virus strains to molecular characteristics of COVID-19 patients, which means genomics, epigenomics and proteinomics data of COVID-19 will continue to emerge. Therefore, we will focus continuously on the latest data and construct unified analysis pipelines, so as to continuously update our database.

DATA AVAILABILITY

This database could be freely available at: http://bio-annotation.cn/scovid. The code is available at https://github.com/ChangluQi/scovid.

9 in total

1. Informative SNP Selection Based on a Fuzzy Clustering and Improved Binary Particle Swarm Optimization Algorithm.

Authors: Zejun Li; Li Ang; Wei Shi; Ning Xin; Min Chen; Hua Tang
Journal: Comput Math Methods Med Date: 2022-06-16 Impact factor: 2.809

2. De Novo design of potential inhibitors against SARS-CoV-2 Mpro.

Authors: Shimeng Li; Lianxin Wang; Jinhui Meng; Qi Zhao; Li Zhang; Hongsheng Liu
Journal: Comput Biol Med Date: 2022-06-15 Impact factor: 6.698

3. The 2022 Nucleic Acids Research database issue and the online molecular biology database collection.

Authors: Daniel J Rigden; Xosé M Fernández
Journal: Nucleic Acids Res Date: 2022-01-07 Impact factor: 16.971

Review 4. Application of Sparse Representation in Bioinformatics.

Authors: Shuguang Han; Ning Wang; Yuxin Guo; Furong Tang; Lei Xu; Ying Ju; Lei Shi
Journal: Front Genet Date: 2021-12-15 Impact factor: 4.599

Review 5. Research on the Computational Prediction of Essential Genes.

Authors: Yuxin Guo; Ying Ju; Dong Chen; Lihong Wang
Journal: Front Cell Dev Biol Date: 2021-12-06

6. Identification of Nine mRNA Signatures for Sepsis Using Random Forest.

Authors: Jing Zhou; Siqing Dong; Ping Wang; Xi Su; Liang Cheng
Journal: Comput Math Methods Med Date: 2022-03-19 Impact factor: 2.238

7. Online Diagnosis and Classification of CT Images Collected by Internet of Things Using Deep Learning.

Authors: Qiufang Ma
Journal: Comput Math Methods Med Date: 2022-03-19 Impact factor: 2.238

8. Host and microbiome features of secondary infections in lethal covid-19.

Authors: Martin Zacharias; Karl Kashofer; Philipp Wurm; Peter Regitnig; Moritz Schütte; Margit Neger; Sandra Ehmann; Leigh M Marsh; Grazyna Kwapiszewska; Martina Loibner; Anna Birnhuber; Eva Leitner; Andrea Thüringer; Elke Winter; Stefan Sauer; Marion J Pollheimer; Fotini R Vagena; Carolin Lackner; Barbara Jelusic; Lesley Ogilvie; Marija Durdevic; Bernd Timmermann; Hans Lehrach; Kurt Zatloukal; Gregor Gorkiewicz
Journal: iScience Date: 2022-08-13

Review 9. Bioinformatics Research on Drug Sensitivity Prediction.

Authors: Yaojia Chen; Liran Juan; Xiao Lv; Lei Shi
Journal: Front Pharmacol Date: 2021-12-09 Impact factor: 5.810

9 in total