| Literature DB >> 35112706 |
Baharak Ahmaderaghi1, Raheleh Amirkhah1, James Jackson2, Tamsin R M Lannagan3, Kathryn Gilroy3, Sudhir B Malla1, Keara L Redmond1, Gerard Quinn1, Simon S McDade1, Tim Maughan4, Simon Leedham5, Andrew S D Campbell3, Owen J Sansom3,6, Mark Lawler1, Philip D Dunne1.
Abstract
Generation of transcriptional data has dramatically increased in the past decade, driving the development of analytical algorithms that enable interrogation of the biology underpinning the profiled samples. However, these resources require users to have expertise in data wrangling and analytics, reducing opportunities for biological discovery by 'wet-lab' users with a limited programming skillset. Although commercial solutions exist, costs for software access can be prohibitive for academic research groups. To address these challenges, we have developed an open source and user-friendly data analysis platform for on-the-fly bioinformatic interrogation of transcriptional data derived from human or mouse tissue, called Molecular Subtyping Resource (MouSR). This internet-accessible analytical tool, https://mousr.qub.ac.uk/, enables users to easily interrogate their data using an intuitive 'point-and-click' interface, which includes a suite of molecular characterisation options including quality control, differential gene expression, gene set enrichment and microenvironmental cell population analyses from RNA sequencing. The MouSR online tool provides a unique freely available option for users to perform rapid transcriptomic analyses and comprehensive interrogation of the signalling underpinning transcriptional datasets, which alleviates a major bottleneck for biological discovery. This article has an associated First Person interview with the first author of the paper.Entities:
Keywords: Bioinformatics; Data analytics; RNA-seq
Mesh:
Year: 2022 PMID: 35112706 PMCID: PMC8990914 DOI: 10.1242/dmm.049257
Source DB: PubMed Journal: Dis Model Mech ISSN: 1754-8403 Impact factor: 5.758
Fig. 1.The MouSR workflow and outputs. (A-F) Utilising transcriptional data derived from human or mouse tissue/cells (A), users are required to have a transcriptional data matrix and sample information as the input for the MouSR pipeline (B), accessible via https://mousr.qub.ac.uk/ (C). From the Introduction page (D), users upload their files and are then presented with a series of point-and-click options for initial data quality control, differential analysis, molecular signalling and microenvironment characterisation (E), that can be saved as high-resolution image files for further use (F). As an example of the adaptable nature of the system at each stage, users have options for bespoke formatting, design and labelling of the resulting plots, which can all be downloaded and saved in a publication-ready format. Figure created using BioRender.
Fig. 2.Data import and exploratory analysis. (A) Two input files are required to begin the analytical pipeline in the app – a gene expression matrix and a metadata that includes sample group labelling. (B) Following data upload, the data summary on samples will be displayed for quick review. (C,D) Exploratory visualisation of data will be provided in a form of 2D principal component analysis (PCA) plot (C) and a 3D PCA plot (D) with labelled sample groups.
Fig. 3.Differential gene expression analysis and visualisation options. (A) Illustration of the various filtering options in the Heatmap panel for the gene expression in all the samples from the chosen groups. (B) Heatmap depicts comparison of AP, APN versus KP, KPN genotypes across tumours. (C) The selected gene side bar for various filtering options is visible to the left of the table. The table includes the genes and annotations uploaded by the user in the first two columns, followed by columns of expression values under each sample annotation. (D) Heatmap indicates reproducible results, compared to data from Jackstadt et al. (2019) (Fig. 6I) across the selected genetically engineered mouse model tumours.
Fig. 4.Gene expression levels and volcano plot options. (A) Volcano plot filtering options, by hovering over the plot using Plotly, the information related to each gene can be accessed immediately. (B) Volcano plot displaying differentially expressed genes with highlighted key genes in text between KPN and KP organoids [reproducible results compared to data from Jackstadt et al. (2019) (Fig. 6A)]. (C) Boxplot displays normalised counts for Tgfb1 expression compared between groups.
Fig. 5.Gene set enrichment analysis (GSEA) and MCP analysis. (A) Enrichment plot for TGF_BETA_SIGNALLING Hallmark gene set for KPN versus KP organoid groups, with P-value, FDR value, enrichment score (ES) and normalised enrichment score (NES). The x-axis is all the genes in the data experiment pre-ranked by the metric, where each black bar is the gene in this gene set (pathway); the y-axis details the level of enrichment via an ES. (B) Single-sample GSEA for individual samples displayed in a heatmap. (C) Murine microenvironment cell population (mMCP) analysis with infiltrating cell population estimates visualised in a heatmap.