| Literature DB >> 23995393 |
Shintaro Katayama1, Virpi Töhönen, Sten Linnarsson, Juha Kere.
Abstract
MOTIVATION: Recent transcriptome studies have revealed that total transcript numbers vary by cell type and condition; therefore, the statistical assumptions for single-cell transcriptome studies must be revisited. SAMstrt is an extension code for SAMseq, which is a statistical method for differential expression, to enable spike-in normalization and statistical testing based on the estimated absolute number of transcripts per cell for single-cell RNA-seq methods.Entities:
Mesh:
Year: 2013 PMID: 23995393 PMCID: PMC3810855 DOI: 10.1093/bioinformatics/btt511
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.(A) Aligned reads and the detected features in 45 mES, 44 MEF and 24 of 50 pg human brain total RNAs (50 pg) by STRT. The features are known genes, repeat elements and spike-in molecules; there were 25 286 features for mouse and 25 665 features for human. (B) False discovery features in 100 trials between SAMseq and SAMstrt. In each trial, 24 of 50 pg human brain total RNAs were separated randomly into two groups, and then each method compared the two groups. Although number of differentially expressed features should be zero in all trials if no technical variations, the false discovery features showed statistical difference (FDR < 1%). (C) Comparison of transcripts per 50 pg human brain total RNAs per feature, between the former 12 samples and the latter 12 samples. This is one representative comparison in the trials for panel B, and there were no significantly different features by the proposed SAMstrt (FDR < 0.01%). Gray color gradation denotes density of features in the scatter plot, and expression level of each feature is represented by median. Dashed diagonal line denotes equivalent expression between two samples. (D) Comparisons of normalized expression levels per cell per feature, between 45 mES and 44 MEF cells, by SAMseq. Usage of the gray color gradation and the dashed diagonal line are same with the panel C. Points are differentially expressed features (FDR < 0.01%). (E) Sum of the normalized expression values of all features by samples and the comparison by the methods. (F) Comparison of transcripts per cell per feature, between 45 mES and 44 MEF cells, by SAMstrt. Usage of the gray color gradation, the dashed diagonal line, and the points are same with the panel D. SAMseq added uniform random numbers between 0 and 0.1 to all values to avoid ties (Li and Tibshirani, 2011), therefore, features which are the most bottom expression level in the panel C, D and F are no expression, or less than detection limit. The normalized expression values by SAMstrt at the panel C and F are moreover estimated transcripts per cell based on the initial concentration of the spike-in molecules