| Literature DB >> 31672995 |
David Gomez-Cabrero1,2,3, Sonia Tarazona4, Isabel Ferreirós-Vidal5, Ricardo N Ramirez6, Carlos Company7, Andreas Schmidt8, Theo Reijmers9, Veronica von Saint Paul10, Francesco Marabita2, Javier Rodríguez-Ubreva7, Antonio Garcia-Gomez7, Thomas Carroll5, Lee Cooper5, Ziwei Liang5, Gopuraja Dharmalingam5, Frans van der Kloet11, Amy C Harms9, Leandro Balzano-Nogueira12, Vincenzo Lagani13,14, Ioannis Tsamardinos14,15, Michael Lappe16, Dieter Maier10, Johan A Westerhuis11,17, Thomas Hankemeier9, Axel Imhof8, Esteban Ballestar7, Ali Mortazavi6, Matthias Merkenschlager18, Jesper Tegner19,20,21, Ana Conesa22.
Abstract
Multi-omics approaches use a diversity of high-throughput technologies to profile the different molecular layers of living cells. Ideally, the integration of this information should result in comprehensive systems models of cellular physiology and regulation. However, most multi-omics projects still include a limited number of molecular assays and there have been very few multi-omic studies that evaluate dynamic processes such as cellular growth, development and adaptation. Hence, we lack formal analysis methods and comprehensive multi-omics datasets that can be leveraged to develop true multi-layered models for dynamic cellular systems. Here we present the STATegra multi-omics dataset that combines measurements from up to 10 different omics technologies applied to the same biological system, namely the well-studied mouse pre-B-cell differentiation. STATegra includes high-throughput measurements of chromatin structure, gene expression, proteomics and metabolomics, and it is complemented with single-cell data. To our knowledge, the STATegra collection is the most diverse multi-omics dataset describing a dynamic biological system.Entities:
Mesh:
Year: 2019 PMID: 31672995 PMCID: PMC6823427 DOI: 10.1038/s41597-019-0202-7
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1STATegra data generation. (a) Inducible Ikaros B3 cell system. Time course experiment collects samples at 6 time-points after Tamoxifen induction of Ikaros expression, Control cells carry empty vector. (b) Diversity of omics platforms, number of biological replicates, batch distribution and lab assignment for B3 cell culture and omic library preparation. Data on each row corresponds to the one omics type on the left. +Previous data from[34].
Fig. 2Experimental design for RNA-seq.
Fig. 3Experimental design for small RNA-seq. Two sequencing batches were run. Samples with red filling were repeated at both batches to allow for estimation of batch effects.
Fig. 4Preprocessing pipelines for 8 omics technologies. See methods for details.
Public repositories hosting STATegra multi-omics data.
| Data set | Database and accession |
|---|---|
| mRNA-seq | GEO, GSE75417[ |
| miRNA-seq | GEO, GSE75394[ |
| RRBS | GEO, GSE75393[ |
| DNAse-seq | GEO, GSE75390[ |
| ATAC-seq | GEO, GSE89362[ |
| scRNA-seq | GEO, GSE89280[ |
| scATAC-seq | GEO, GSE89362[ |
| ChIP-seq | GEO, GSE38200[ |
| Proteomics | ProteomeXchange, PXD003263[ |
| Metabolomics | MetaboLights, MTBLS283[ |
Fig. 5Biomarkers of B3 cell differentiation across three experimental batches.
Fig. 6Quality control of STATegra multi-omics data. (a) Distribution of pair-wise correlation values for samples belonging to different (Across) or the same (Within) experimental conditions. (b) PCA analysis. Only the Ikaros series is shown. Data were preprocessed as described in Methods. Time progression is represented by an increasingly darker red color.
Fig. 7STATegra data for lactate dehydrogenase A. (a) LDHA reaction at glycolysis. (b) Promoter regions of the Ldha gene showing a DHS and IKZF1 footprint identified by DNase-seq. Only values for the Ikaros-induced time course are shown. In red, the IKZF1 ChIP-seq peak region. (c–e) Paintomics[27] representation for Ldha data as heatmaps and line plots of log2FC values between Ikaros and Control. Data points correspond, from left to right, to 0, 2, 6, 12, 18 and 24 hours after Ikaros induction. At heatmaps, red indicates up-regulation and blue indicates down-regulation. (c) Ldha data for DNase-seq, RNA-seq, Proteomics. (d) Data for miRNA-seq where miRNA-Ldha target data was predicted by at least 5 algorithms in the mirWalk[70] database. (e) STATegra log2FC values for pyruvate (left) and lactate (right). (f) Major Gene Expression, Proteomics, and DNase-seq trends for glycolysis pathway computed by Paintomics[27].
| Measurement(s) | messenger RNA • miRNA • methylation • deoxyribonuclease activity • assay for transposase-accessible chromatin using sequencing • scRNA • chromatin immunoprecipitation • protein expression profiling • metabolite • metabolite ( |
| Technology Type(s) | RNA sequencing • RRBS • DNase-Seq • ATAC-seq • scRNA-seq • scATAC-seq (Microfluidics) • ChIP-seq • mass spectrometry • ultra-performance liquid chromatography-mass spectrometry • gas chromatography-mass spectrometry |
| Factor Type(s) | Ikaros level • harvesting time |
| Sample Characteristic - Organism | Mus musculus |