| Literature DB >> 30400809 |
Momeneh Foroutan1,2, Dharmesh D Bhuva2,3, Ruqian Lyu2, Kristy Horan2, Joseph Cursons4,5, Melissa J Davis6,7,8.
Abstract
BACKGROUND: Gene set scoring provides a useful approach for quantifying concordance between sample transcriptomes and selected molecular signatures. Most methods use information from all samples to score an individual sample, leading to unstable scores in small data sets and introducing biases from sample composition (e.g. varying numbers of samples for different cancer subtypes). To address these issues, we have developed a truly single sample scoring method, and associated R/Bioconductor package singscore ( https://bioconductor.org/packages/singscore ).Entities:
Keywords: Dimensional reduction; Gene set enrichment; Gene set score; Gene signature; Molecular features; Molecular phenotypes; Personalised medicine; Single sample; Singscore; Transcriptome
Mesh:
Year: 2018 PMID: 30400809 PMCID: PMC6219008 DOI: 10.1186/s12859-018-2435-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
List of data sets used in the current study
| Data | Source | Date accessed | Reference |
|---|---|---|---|
| TCGA RNA-seq | The UCSC Cancer Genomics Browser [ | February 2016 | PMID: 23000897 |
| TCGA microarray | The UCSC Cancer Genomics Browser [ | October 2015 | PMID: 23000897 |
| CCLE RNA-seq | Cancer Cell Line Data Repository [ | April 2017 | PMID: 22460905 |
| Daemen et al. RNA-seq | Gene Expression Omnibus [ | July 2016 | PMID: 24176112 |
| TGF | Data [ | September 2017 | PMID: 28119430 |
| GSE79235 | Gene Expression Omnibus [ | April 2018 | PMID: 27154822 |
Fig. 1a Comparing the stability of scoring methods to changes in the number of samples and genes within transcriptomic data. For both Spearman’s correlation coefficients and concordance index, a higher value indicates better performance, with 0 and 0.5, respectively, indicating poor performance for each method. Similar results were observed when other signatures were used for scoring (Additional file 1: Figure S4 and S5); b Comparing the power of methods to distinguish groups with distinct biology; c Comparing the type 1 error for different methods when distinguishing groups with distinct biology; d Comparing the ability of methods to call true differential gene sets between two conditions
Fig. 2a Epithelial and mesenchymal scores obtained from singscore for the TCGA breast cancer samples (hexbin density plot) and a collection of breast cancer cell lines (circle markers, coloured by subtype). Note that as per the original study by Tan et al., the epithelial and mesenchymal signatures are distinct (but overlapping) for tumours and cell lines; b Differences in epithelial and mesenchymal scores for 32 overlapping breast cancer cell lines between Daemen et al. and the CCLE datasets. The majority of cell lines show relatively consistent scores in these two data sets (circled in the lower left corner); c The HCC1428 cell line has very similar scores in each dataset, while the MDA-MB-231 cell line has a large shift in epithelial score, and the HCC202 cell line has a large shift in mesenchymal score; d Three microarray samples from the TGFβ- EMT data set [8] with low, medium and high scores for the TGFβ-EMT signature; e Scatter plots demonstrating the relationship between rank dispersions (MAD) and scores obtained by singscore, for: total score (combined up- and down-set scores), distinct expected up-regulated gene set scores, and distinct expected down-regulated gene set scores