Literature DB >> 34320637

rPanglaoDB: an R package to download and merge labeled single-cell RNA-seq data from the PanglaoDB database.

Daniel Osorio1, Marieke L Kuijjer1,2, James J Cai3,4,5,6.   

Abstract

MOTIVATION: Characterizing cells with rare molecular phenotypes is one of the promises of high throughput single-cell RNA sequencing (scRNA-seq) techniques. However, collecting enough cells with the desired molecular phenotype in a single experiment is challenging, requiring several samples preprocessing steps to filter and collect the desired cells experimentally before sequencing. Data integration of multiple public single-cell experiments stands as a solution for this problem, allowing the collection of enough cells exhibiting the desired molecular signatures. By increasing the sample size of the desired cell type, this approach enables a robust cell type transcriptome characterization.
RESULTS: Here, we introduce rPanglaoDB, an R package to download and merge the uniformly processed and annotated scRNA-seq data provided by the PanglaoDB database. To show the potential of rPanglaoDB for collecting rare cell types by integrating multiple public datasets, we present a biological application collecting and characterizing a set of 157 fibrocytes. Fibrocytes are a rare monocyte-derived cell type, that exhibits both the inflammatory features of macrophages and the tissue remodeling properties of fibroblasts. This constitutes the first fibrocytes' unbiased transcriptome profile report. We compared the transcriptomic profile of the fibrocytes against the fibroblasts collected from the same tissue samples and confirm their associated relationship with healing processes in tissue damage and infection through the activation of the prostaglandin biosynthesis and regulation pathway. AVAILABILITY: rPanglaoDB is implemented as an R package available through the CRAN repositories https://CRAN.R-project.org/package=rPanglaoDB.
© The Author(s) 2021. Published by Oxford University Press.

Entities:  

Year:  2021        PMID: 34320637      PMCID: PMC8723139          DOI: 10.1093/bioinformatics/btab549

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Merging and integrating the count matrices derived from multiple independent public single-cell RNA sequencing (scRNA-seq) experiments allows for better evaluation of the biological patterns in cell composition of tissues, as well as the identification of patterns of gene expression and gene regulation that are consistent across cells of the same cell type obtained from independent samples (Swamy ). A good quality integration of multiple datasets allowing comparison and contrasting of the data across different projects begins with a consistent preprocessing of the samples, using the same reference genome and the same quantification method to generate the count matrices (Lachmann ). These steps are then followed by the batch effect removal, normalization, cell type annotation and characterization of samples and cell types (Butler ; Hie ; Korsunsky ; Luecken ; Stuart ). PanglaoDB is a secondary scRNA-seq database that reports the annotated count matrices for thousands of human and mice scRNA-seq experiments deposited in the sequence read archive (SRA) database of the National Center for Biotechnology Information. Samples available in the PanglaoDB database are uniformly processed with the ‘alona’ package and made available in a web-based unified framework at https://panglaodb.se/ (Franzen ; Franzen and Bjorkegren, 2020). However, the PanglaoDB database reports each sample on a single web page and does not offer options to automatically download or merge multiple available datasets based on molecular phenotypes or specific cell-type composition of the samples. For that reason, here, we introduce rPanglaoDB, an R package to download and merge the uniformly processed and annotated scRNA-seq data provided by the PanglaoDB database. The package contains a comprehensive set of functions for filtering samples by organism and tissue from which the cells were collected, for cell type and signature marker genes expressed by the cells, as well as for the quality control and merging of the downloaded datasets. The final output of rPanglaoDB is a Seurat object, facilitating the downstream analysis and characterization of the data as well as the integration of datasets not available in PanglaoDB after being processed through ‘alona’ (Franzen ).

2 Material and methods

rPanglaoDB includes four main functions, two for querying the list of samples available in the database, one to query the cell-type composition of the samples and one to download the samples’ count matrices and associated annotations. Querying samples in the database: Currently available samples can be accessed through two functions: ‘getSampleList’ which returns the list of all samples included in the database together with their associated annotations, such as the SRA database identifiers, the species and tissue from which the cells were collected, the protocol used and the number of cells included in the sample, and ‘getMarkers’ which returns the list of clusters of cells in which a user-defined set of markers is expressed. Querying samples’ composition information from the database: Cell-type and number of cells by cell-type for the samples included in the database can be accessed through the ‘getSampleComposition’ function. It allows filtering of the results by the metadata associated with the sample returned by the ‘getSampleList’ function. Downloading samples’ count matrices and associated annotations: Once identified the samples, as well as the count matrices and the associated annotations can be downloaded using the ‘downloadSamples’ function. This function includes an option to return each sample as an independent Seurat object or to merge all samples into one Seurat object.

3 Results

Using the ‘getMarkers’ function, we identified a cluster of cells in the SRS3121028 sample derived from skin wound tissues (3 days after scab detachment) expressing CD34, ACTA2, FN1, Collagen V, FAP, SIRPA and the lack of expression of CSF1R (Fig. 1A). Such combinatorial expression of genes differentiate fibrocytes from macrophages and fibroblasts (Lim ; Pilling ; Reilkoff ).
Fig. 1.

Characterization of the fibrocytes transcriptome. (A) Identification of cells expressing marker genes that differentiate fibrocytes’ identity from macrophages and fibroblasts (CD34, ACTA2, FN1, Collagen V, FAP and SIRPA). (B) Cross validation of the identified cells expressing CD34, ACTA2, COL5A1, COL5A2, COL5A3, FN1, FAP, SIRPA, PTPRC, MME and SEMA7A by kernel density estimation through the Nebulosa package. (C) Volcano plot displaying the differential expression between fibrocytes and the fibroblasts collected in the same merged samples. (D) Enrichment of the prostaglandin biosynthesis and regulation pathway using GSEA through the fgsea package. (E) Enrichment of the prostaglandin biosynthesis and regulation pathway using ssGSEA through the GSVA package

Characterization of the fibrocytes transcriptome. (A) Identification of cells expressing marker genes that differentiate fibrocytes’ identity from macrophages and fibroblasts (CD34, ACTA2, FN1, Collagen V, FAP and SIRPA). (B) Cross validation of the identified cells expressing CD34, ACTA2, COL5A1, COL5A2, COL5A3, FN1, FAP, SIRPA, PTPRC, MME and SEMA7A by kernel density estimation through the Nebulosa package. (C) Volcano plot displaying the differential expression between fibrocytes and the fibroblasts collected in the same merged samples. (D) Enrichment of the prostaglandin biosynthesis and regulation pathway using GSEA through the fgsea package. (E) Enrichment of the prostaglandin biosynthesis and regulation pathway using ssGSEA through the GSVA package Fibrocytes are associated with fibrosis, autoimmunity, cardiovascular disease and asthma, among other pathologies (Reilkoff ). Since, fibrocytes are marrow-derived cells that differentiate into fibroblasts-like phenotypes, they are usually wrongly labeled as fibroblasts. Thus, using the ‘getSamples’ function in rPanglaoDB, we downloaded all fibroblasts available from dermis samples in the database (SRA accessions: SRS3121028 and SRS3121030) (Lim ). We merged a total of 2172 cells and processed the associated scRNA-seq data using the Seurat package recommended pipeline (Stuart ). Datasets were integrated using Harmony and further corroboration of the marker genes defining their identity as fibrocytes (CD34, ACTA2, COL5A1, COL5A2, COL5A3, FN1, FAP, SIRPA, PTPRC, MME and SEMA7A) was assessed using the Nebulosa package (Fig. 1B) (Alquicira-Hernandez and Powell, 2021; Korsunsky ; Reilkoff ). Differential expression analysis of the cluster 8, which contained 157 fibrocytes (Fig. 1A), against all the fibroblasts in the samples was performed using the MAST package (Fig. 1C); returning 50 upregulated genes in fibrocytes associated with the TGF-beta regulation of extracellular matrix, ECM-receptor interaction, Prostaglandin biosynthesis and regulation, Notch signaling pathway, Interleukin-5 regulation of apoptosis, Integrin beta-5 pathway, Oncostatin M, Hematopoietic cell lineage and Inflammatory response pathway (FDR < 0.05 using the hypergeometric-test through the enrichR package) (Finak ; Xie ). We cross-validated the enrichment of the genes associated with the Prostaglandin biosynthesis and regulation pathway (ANXA3, S100A10, ANXA5, ANXA2, PTGIS, ANXA1, S100A6, PTGS1 and HPGD) using the Gene Set Enrichment Analysis (GSEA) approach included in the fgsea package (FDR , Fig. 1D) and the single-sample Gene Set Enrichment Analysis (ssGSEA) included in the GVSA package (FDR = , Fig. 1E) (Hanzelmann ; Korotkevich ). All other associations did not pass the FDR threshold in all the other approaches applied (GSEA and ssGSEA). Our results show the potential of rPanglaoDB as a tool to collect cells with rare molecular phenotypes from the PanglaoDB database. We anticipate its use in the construction of atlases to characterize the different molecular phenotypes exhibited by different cell types in different tissues and organisms. We also provide the first unbiased, highly specific characterization of the fibrocyte transcriptome in skin wound tissues and confirm their association with healing processes in tissue damage and infection through the activation of the Prostaglandin Biosynthesis and Regulation Pathway (Grieb ; Zhang ).

Funding

D.O. was supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement [801133]. M.L.K. was supported by the Norwegian Research Council, Helse Sør-Øst, and University of Oslo through the Centre for Molecular Medicine Norway (NCMM, 187615). Conflict of Interest: none declared.
  18 in total

1.  Comprehensive Integration of Single-Cell Data.

Authors:  Tim Stuart; Andrew Butler; Paul Hoffman; Christoph Hafemeister; Efthymia Papalexi; William M Mauck; Yuhan Hao; Marlon Stoeckius; Peter Smibert; Rahul Satija
Journal:  Cell       Date:  2019-06-06       Impact factor: 41.582

2.  Nebulosa recovers single cell gene expression signals by kernel density estimation.

Authors:  Jose Alquicira-Hernandez; Joseph E Powell
Journal:  Bioinformatics       Date:  2021-01-18       Impact factor: 6.937

3.  PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data.

Authors:  Oscar Franzén; Li-Ming Gan; Johan L M Björkegren
Journal:  Database (Oxford)       Date:  2019-01-01       Impact factor: 3.451

4.  Integrating single-cell transcriptomic data across different conditions, technologies, and species.

Authors:  Andrew Butler; Paul Hoffman; Peter Smibert; Efthymia Papalexi; Rahul Satija
Journal:  Nat Biotechnol       Date:  2018-04-02       Impact factor: 54.908

5.  Gene Set Knowledge Discovery with Enrichr.

Authors:  Zhuorui Xie; Allison Bailey; Maxim V Kuleshov; Daniel J B Clarke; John E Evangelista; Sherry L Jenkins; Alexander Lachmann; Megan L Wojciechowicz; Eryk Kropiwnicki; Kathleen M Jagodnik; Minji Jeon; Avi Ma'ayan
Journal:  Curr Protoc       Date:  2021-03

6.  Hedgehog stimulates hair follicle neogenesis by creating inductive dermis during murine skin wound healing.

Authors:  Chae Ho Lim; Qi Sun; Karan Ratti; Soung-Hoon Lee; Ying Zheng; Makoto Takeo; Wendy Lee; Piul Rabbani; Maksim V Plikus; Jason E Cain; David H Wang; D Neil Watkins; Sarah Millar; M Mark Taketo; Peggy Myung; George Cotsarelis; Mayumi Ito
Journal:  Nat Commun       Date:  2018-11-21       Impact factor: 14.919

7.  Prostaglandin E2 hydrogel improves cutaneous wound healing via M2 macrophages polarization.

Authors:  Shuaiqiang Zhang; Yuanyuan Liu; Xin Zhang; Dashuai Zhu; Xin Qi; Xiaocang Cao; Yihu Fang; Yongzhe Che; Zhong-Chao Han; Zuo-Xiang He; Zhibo Han; Zongjin Li
Journal:  Theranostics       Date:  2018-10-22       Impact factor: 11.556

8.  Fast, sensitive and accurate integration of single-cell data with Harmony.

Authors:  Ilya Korsunsky; Nghia Millard; Jean Fan; Kamil Slowikowski; Fan Zhang; Kevin Wei; Yuriy Baglaenko; Michael Brenner; Po-Ru Loh; Soumya Raychaudhuri
Journal:  Nat Methods       Date:  2019-11-18       Impact factor: 28.547

9.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data.

Authors:  Greg Finak; Andrew McDavid; Masanao Yajima; Jingyuan Deng; Vivian Gersuk; Alex K Shalek; Chloe K Slichter; Hannah W Miller; M Juliana McElrath; Martin Prlic; Peter S Linsley; Raphael Gottardo
Journal:  Genome Biol       Date:  2015-12-10       Impact factor: 13.583

10.  Massive mining of publicly available RNA-seq data from human and mouse.

Authors:  Alexander Lachmann; Denis Torre; Alexandra B Keenan; Kathleen M Jagodnik; Hoyjin J Lee; Lily Wang; Moshe C Silverstein; Avi Ma'ayan
Journal:  Nat Commun       Date:  2018-04-10       Impact factor: 17.694

View more
  1 in total

1.  scTenifoldKnk: An efficient virtual knockout tool for gene function predictions via single-cell gene regulatory network perturbation.

Authors:  Daniel Osorio; Yan Zhong; Guanxun Li; Qian Xu; Yongjian Yang; Yanan Tian; Robert S Chapkin; Jianhua Z Huang; James J Cai
Journal:  Patterns (N Y)       Date:  2022-02-01
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.