| Literature DB >> 32392297 |
Federico Taverna1, Jermaine Goveia1, Tobias K Karakach1, Shawez Khan1, Katerina Rohlenova1, Lucas Treps1, Abhishek Subramanian1, Luc Schoonjans1, Mieke Dewerchin1, Guy Eelen1, Peter Carmeliet1.
Abstract
The amount of biological data, generated with (single cell) omics technologies, is rapidly increasing, thereby exacerbating bottlenecks in the data analysis and interpretation of omics experiments. Data mining platforms that facilitate non-bioinformatician experimental scientists to analyze a wide range of experimental designs and data types can alleviate such bottlenecks, aiding in the exploration of (newly generated or publicly available) omics datasets. Here, we present BIOMEX, a browser-based software, designed to facilitate the Biological Interpretation Of Multi-omics EXperiments by bench scientists. BIOMEX integrates state-of-the-art statistical tools and field-tested algorithms into a flexible but well-defined workflow that accommodates metabolomics, transcriptomics, proteomics, mass cytometry and single cell data from different platforms and organisms. The BIOMEX workflow is accompanied by a manual and video tutorials that provide the necessary background to navigate the interface and get acquainted with the employed methods. BIOMEX guides the user through omics-tailored analyses, such as data pretreatment and normalization, dimensionality reduction, differential and enrichment analysis, pathway mapping, clustering, marker analysis, trajectory inference, meta-analysis and others. BIOMEX is fully interactive, allowing users to easily change parameters and generate customized plots exportable as high-quality publication-ready figures. BIOMEX is open source and freely available at https://www.vibcancer.be/software-tools/biomex.Entities:
Year: 2020 PMID: 32392297 PMCID: PMC7319461 DOI: 10.1093/nar/gkaa332
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The BIOMEX workflow. The workflow guides the user through distinct analysis steps. The data (and metadata) need to be uploaded, and each uploaded dataset must be annotated with the relevant information (i.e. the omics data type, technology, feature identifier, etc.). In the processing step, the data is cleaned and consistency checks are performed to verify that the data was uploaded in the correct format. After the feature identifiers are mapped to feature names, the data is filtered and normalized in the pretreatment step. Depending on the omics data type, the data can also be imputed or batch corrected. Once the data is pretreated, it is ready for downstream analysis. The different analyses available in BIOMEX can be divided into five categories: (i) quantification analyses quantify the abundance level of the features in the data (and metadata); (ii) exploratory analyses assist in understanding the underlying structure of the data; (iii) pairwise analyses reveal the functional differences between groups; (iv) meta-analysis combines results from different studies in a singular, unique and robust result and (v) auxiliary analyses (e.g. machine learning and survival analysis). Ultimately, all the analyses can be saved in a self-contained folder that can be shared between scientists and results can be customized and exported either as tables or high quality publication-ready figures. Abbreviations: TP, true positive, FN, false negative, TN, true negative, FP, false positive. Note: The term ‘features’ is used to indicate genes, metabolites and proteins.
Figure 2.Cholangiocarcinoma TCGA data analysis results. (A) Overall histological type percentage of the TCGA cholangiocarcinoma dataset. (B) PCA of normal and intrahepatic tumor samples. (C) Clustered heatmap based on the correlation of normal and intrahepatic tumor samples. (D) Competitive enrichment analysis of normal versus intrahepatic tumor samples using the gene sets related to the ‘Environmental Information Processing’ and ‘Cellular Process’ KEGG pathway maps. The upregulated gene sets are shown in red, the downregulated gene sets are shown in blue. Note: There are two enriched KEGG gene sets related to Hippo signaling, indicated separately in the figure. Hippo signaling (1): KEGG Hippo signaling pathway; Hippo signaling (2): KEGG Hippo signaling pathway—multiple species. (E) Differential analysis of normal versus intrahepatic tumor samples shown in a volcano plot. The significantly different genes (P < 0.05) are shown in blue, the non-significant genes are shown in grey. (F) Barplot visualization of SPP1 expression in normal and intrahepatic tumor samples. The error bar represents the standard error. (G) Boxplot visualization of SFN expression in normal and intrahepatic tumor samples. The box represents the range between the first quartile (Q1) and the third quartile (Q3), the horizontal line represents the median, the whiskers represent the interquartile ranges (IQR, 1.5 × IQR below Q1 and 1.5 × IQR above Q3). (H) Meta-analysis of intrahepatic, hilar-perihilar and distal tumor types. Each violin plot represents the differential analysis of normal versus the corresponding tumor type. The top 2 most consistently upregulated genes (CEACAM5 and AFAP1-AS1) are highlighted. (I) Survival analysis based on MMP11 gene expression of intrahepatic tumor samples. All the results shown in the figure can be directly explored in the BIOMEX ‘Case studies’ section. The parameters used to generate these plots can be found in the manual.
Figure 3.Endothelial cell atlas data analysis results. (A) t-SNE plot of ECs from three murine tissues (heart, liver, lung). (B) UMAP of liver tissue showing the endothelial cell clusters as described in the EC atlas. (C) Clustered heatmap showing the top 5 marker genes for each cluster. Colors represent row-wise scaled gene expression with a mean of 0 and a standard deviation of 1 (Z scores). (D) Number of cells for each cluster in liver ECs. (E) Differentiation trajectory of the classic EC phenotypes (arteries, capillaries, veins) in liver. (F) PCA on the pairwise Jaccard similarity coefficients between the top 50 marker genes of the classic EC phenotypes (arteries, capillaries and veins) in heart, lung and liver. (G) Sankey diagram showing the scmap cluster projection of the EC atlas liver data on the Tabula Muris EC liver data. All the results shown in the figure can be directly explored in the BIOMEX ‘Case studies’ section. The parameters used to generate these plots can be found in the manual.