| Literature DB >> 29045713 |
Imad Abugessaisa1, Shuhei Noguchi1, Michael Böttcher1, Akira Hasegawa1, Tsukasa Kouno1, Sachi Kato1, Yuhki Tada2, Hiroki Ura2, Kuniya Abe2, Jay W Shin1, Charles Plessy1, Piero Carninci1, Takeya Kasukawa1.
Abstract
Published single-cell datasets are rich resources for investigators who want to address questions not originally asked by the creators of the datasets. The single-cell datasets might be obtained by different protocols and diverse analysis strategies. The main challenge in utilizing such single-cell data is how we can make the various large-scale datasets to be comparable and reusable in a different context. To challenge this issue, we developed the single-cell centric database 'SCPortalen' (http://single-cell.clst.riken.jp/). The current version of the database covers human and mouse single-cell transcriptomics datasets that are publicly available from the INSDC sites. The original metadata was manually curated and single-cell samples were annotated with standard ontology terms. Following that, common quality assessment procedures were conducted to check the quality of the raw sequence. Furthermore, primary data processing of the raw data followed by advanced analyses and interpretation have been performed from scratch using our pipeline. In addition to the transcriptomics data, SCPortalen provides access to single-cell image files whenever available. The target users of SCPortalen are all researchers interested in specific cell types or population heterogeneity. Through the web interface of SCPortalen users are easily able to search, explore and download the single-cell datasets of their interests.Entities:
Mesh:
Year: 2018 PMID: 29045713 PMCID: PMC5753281 DOI: 10.1093/nar/gkx949
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Workflow for data processing. (A) General workflow for acquiring, processing and publishing single-cell datasets. The workflow consists of six processes. The main input to the workflow is study accession number. The data acquisition of raw sequence files (FASTQ/SRA) and the study metadata followed by quality assessment procedures, metadata construction and ontology annotation. All outputs are integrated into the SCPortalen database. (B) Workflow for integrating single-cell images. Two main microscopic platforms have been used to capture single-cell images.
Figure 2.Example of computed gene expression correlation matrix. The correlation matrix as implemented in SCPortalen. It shows the gene expression correlation for the dataset titled Identification of novel regulators of Th17 cell pathogenicity by single-cell genomics (18). In the figure the y- and x-axis is the cell_id. The color of each dot in the right panel represents the level of the gene expression. When users hover over any dot, its Gencode ID will display.
Count statistics of SCPortalen database content
| Attribute | Organism | |
|---|---|---|
| Homo sapiens | Mus musculus | |
| Number of single-cells | 20 761 | 46 385 |
| Number of datasets | 23 | 47 |
| Number of cell types | 79 | 119 |
| Number of ontology terms | 67 | 85 |
| Number of FASTQ files | 61 938 | |
| Number of BAM files | 60 217 | |
| Number of cell images | 32 256 | 0 |
| Number of z-stack movies | 5412 | 0 |
This table shows general statistics of the content and coverage of the SCPortalen.
Figure 3.Single-cell transcriptomics dataset view. The figure shows the main elements of the single-cell transcriptomics dataset view. In the main menu bar, (1)–(6). The single-cell studies view is the active tab (7). For each study, a summary of the total number of samples is shown at the top left corner of the rectangle. Several attributes and links are provided in addition to the PCA and t-SNE plots. We also provide the computed PCA matrix (PC computed in FPKM) for each dataset as a table for downloading from this view.
Basic metadata attributes of each single-cell in SCPortalen
| Attribute | Description |
|---|---|
| Cell ID | A unique cell identification number based on the run accession number or the Fluidigm C1 chip ID plus the position information of a cell on the cDNA harvest plate |
| Accession number | This is a unique study identifier, this accession number provided by INSDEC |
| Sample accession | This is a unique sample identifier, the sample accession number provided by INSDEC |
| Organism | This attribute holds the name of the organism in which the single-cell originated from |
| Cell type | The cell type information as provided by the study authors |
| Sequencer | The sequencer attribute refers to the platform used to perform the scRNA-seq |
| Assay type | This field holds the type of assay used for single-cell sample preparation |
| Library | The library field defines the library protocol used to generate the single-cell library for RNA sequencing |
| Library layout | This attribute provides layout information of the sequence library, either Single-End or Paired-End |
| Cell-cycle phase | This is predicted cell-cycle phase based on the transcriptomic profile of the cell. The phases are [G2, G2.M, M.G1, G1.S, S] |
| Ontology term | The ontology term used to annotate single-cells e.g. CL:0002322 [embryonic stem cell] |
This table lists the basic metadata attributes of the single-cell as implemented in SCPortalen database (under the single-cell sample list).
Figure 4.Single-cell samples view. This figure shows the elements of the single-cell sample view. Each sample has a set of attributes and links. (1) The links lead to a cell ontology tree and (2) SRA/FASTQ files on external sites. The ‘Cell ontology tree’ directs to the EBI ontology lookup services (OLS) via web service. The ‘SRA/FASTQ file’ link opens a page for downloading sequence reads files. Other types of links enable direct file downloads, or lead to reports from SCPortalen (e.g. (4) FastQC report, (6) BAM files). Finally, (3) mapping QC and (5) library preparation information are listed in the table view. Using the check-boxes in the left side of the menu user will be able to select any number of cell and export the selected ones to CSV or Excel file format.