| Literature DB >> 28376718 |
Jitao David Zhang1, Klas Hatje2, Gregor Sturm2, Clemens Broger2,3, Martin Ebeling2, Martine Burtin4, Fabiola Terzi4, Silvia Ines Pomposiello2, Laura Badi2.
Abstract
BACKGROUND: Gene expression data can be compromised by cells originating from other tissues than the target tissue of profiling. Failures in detecting such tissue heterogeneity have profound implications on data interpretation and reproducibility. A computational tool explicitly addressing the issue is warranted.Entities:
Keywords: Gene expression; Gene-set enrichment analysis; Quality control; Wilcoxon-Mann-Whitney test
Mesh:
Year: 2017 PMID: 28376718 PMCID: PMC5379536 DOI: 10.1186/s12864-017-3661-2
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Results of simulation studies a Speed benchmark. Left panel: running time of BioQC and R-native Wilcoxon test with simulated datsets of increasing sample sizes. Right panel: ratio of running time between the two implementations. b Sensitivity of BioQC revealed by simulations with model-generated data. Left panel: whisker-box-plot of BioQC enrichment scores of the selected gene set (Y-axis) against the average expression differences of genes in the set compared with genes not in the set (X-axis). Right panel: whisker-box-plot of ranks of enrichment scores. c Sensitivity of BioQC revealed by simulations with real-world data. Left panel: Enrichment scores of cardiac-muscle- and small-intestine-enriched genes as canine heart and jejunum samples are mixed with varying weights. Right panel: Ranks of enrichment scores plotted against varying weights
Fig. 2BioQC detects pancreas contamination of mouse kidney samples a Enrichment scores (ES) of kidney and pancreas signatures. b Normalised microarray signals of pancreas-enriched genes (zero mean and one standard deviation per gene). c Expression of amylase and elastase detected by qRT-PCR, with indices of contaminated samples labeled. AU: Arbitrary Unit
Fig. 3BioQC reveals sample clustering and tissue heterogeneity of small-intestine samples in GTEx a Tissue enrichment scores reported by BioQC when applied to small-intestine samples in GTEx. Samples are shown in columns and clustered by correlation-based hierarchical clustering. Ten tissue signatures with the highest average scores are shown in rows. Expression profiles of selected tissue signatures (with bold row names) in representative samples (in yellow boxes) are visualized below. The representative samples are labeled by the last five digits/letters of respective GTEx sample identifiers. b–e Whisker-box-plots of genes enriched in small intestine, lymphocytes, and cardiac muscle in representative samples. Each dot represents one signature gene. Dash lines indicate RPKM equal to one which represents an arbitrary threshold of low gene expression. RPKM: Reads Per Kilobase per Million mapped reads