| Literature DB >> 32479601 |
Suluxan Mohanraj1, J Javier Díaz-Mejía1, Martin D Pham2, Hillary Elrick2, Mia Husić2, Shaikh Rashid2, Ping Luo1, Prabnur Bal2, Kevin Lu2, Samarth Patel2, Alaina Mahalanabis2, Alaine Naidas3, Erik Christensen3, Danielle Croucher1, Laura M Richards1, Parisa Shooshtari3,4,5, Michael Brudno2,6,7, Arun K Ramani2, Trevor J Pugh1,5,8.
Abstract
CReSCENT: CanceR Single Cell ExpressioN Toolkit (https://crescent.cloud), is an intuitive and scalable web portal incorporating a containerized pipeline execution engine for standardized analysis of single-cell RNA sequencing (scRNA-seq) data. While scRNA-seq data for tumour specimens are readily generated, subsequent analysis requires high-performance computing infrastructure and user expertise to build analysis pipelines and tailor interpretation for cancer biology. CReSCENT uses public data sets and preconfigured pipelines that are accessible to computational biology non-experts and are user-editable to allow optimization, comparison, and reanalysis for specific experiments. Users can also upload their own scRNA-seq data for analysis and results can be kept private or shared with other users.Entities:
Year: 2020 PMID: 32479601 PMCID: PMC7319570 DOI: 10.1093/nar/gkaa437
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.CReSCENT’s web portal run control menu. (A) Control menu of a CReSCENT project called Human Tumour-Associated T cells. Users can see details of the project and controls to share the project with other CReSCENT users, upload cell metadata, delete the project, and view number and status of runs in the project. Runs within a project use the same input dataset (i.e. the same scRNA-seq measurements) but parameters of the run change between runs. For example, this panel shows three runs using different clustering resolutions:1.0 (left), 0.7 (middle) and 0.3 (right). Each run box has three controllers: visualization of results, display of the run parameters and deletion of the run. (B) QC results from the run with resolution of 0.3. Each dot represents a cell and the violins represent the distribution of one out of four QC metrics. Distributions are shown for cell populations before and after filtering cells by each QC metric. In this run, no cells were filtered out. (C) UMAP plot showing one of the four QC metrics (percentage of mitochondrial genes). The other three metrics can be visualized by selecting them from a drop-down menu in the results control panel (shown in Figure 2). Interactive visualization tools allow the user to zoom-in/out and select data from the plots.
Figure 2.CReSCENT’s web portal visualizations menu. (A) Control menu of a CReSCENT project called Human Tumour-Associated T cells, showing a UMAP with the five cell clusters detected by the CReSCENT pipeline. The right-hand menu allows the user to switch between plot types (UMAP, t-SNE or violin) and between the type of data to be plotted (e.g. cell clusters, DEG’s expression or metadata). (B) UMAP of CMC1, which is a DEG from cluster 3. Each dot denotes an individual cell and the opacity of the dot corresponds to the expression of CMC1 in that cell. (C) Violin plot representation of CMC1 expression across the five cell clusters detected by CReSCENT shows higher expression of the gene in cluster 3. (D) UMAP coordinates obtained by CReSCENT’s pipeline are used to show cell metadata provided by the user for cell types (‘group’ discrete categories). (E) Similar to D but showing other types of metadata provided by the user for expression of gene CCL5 measured by an orthogonal method with ‘numeric’ continuous values.
Figure 3.CReSCENT’s web portal technologies and their interactions. Overview of the technology components utilized in developing CReSCENT. Blue lines indicate Docker containers. Files are uploaded through the React front end (A and B) and saved into MinIO and MongoDB (C), parameters are sent to the Express server, and the CWL job is sent to compute (D). Once results are available, project and run documents are updated in MongoDB, results are visualized in the front end with gene expression queries processed by a Python back end, and results are served as a zipped download through MinIO. Interactions are facilitated by GraphQL.
Figure 4.Computing time benchmark. Computing times of runs of the CReSCENT pipeline using either a scRNA-seq dataset of 68 579 PBMCs or random stepwise subsamples of 1000 to 60 000 cells from the full dataset. The full dataset or each subsample were run three independent times through the pipeline. The number of cores used for each run is indicated in the x-axis and the run time is plotted in the y-axis. Each line represents the average computing time used to process either the full dataset or each of the subsamples in three replications. Each line ribbon represents one standard deviation from the same three replications analyses.