| Literature DB >> 31708960 |
André N A Gonçalves1, Melissa Lever1, Pedro S T Russo1, Bruno Gomes-Correia2, Alysson H Urbanski1, Gabriele Pollara3, Mahdad Noursadeghi3, Vinicius Maracaja-Coutinho2, Helder I Nakaya1,4.
Abstract
Transcriptome analyses have increased our understanding of the molecular mechanisms underlying human diseases. Most approaches aim to identify significant genes by comparing their expression values between healthy subjects and a group of patients with a certain disease. Given that studies normally contain few samples, the heterogeneity among individuals caused by environmental factors or undetected illnesses can impact gene expression analyses. We present a systematic analysis of sample heterogeneity in a variety of gene expression studies relating to inflammatory and infectious diseases and show that novel immunological insights may arise once heterogeneity is addressed. The perturbation score of samples is quantified using nonperturbed subjects (i.e., healthy subjects) as a reference group. Such a score allows us to detect outlying samples and subgroups of diseased patients and even assess the molecular perturbation of single cells infected with viruses. We also show how removal of outlying samples can improve the "signal" of the disease and impact detection of differentially expressed genes. The method is made available via the mdp Bioconductor R package and as a user-friendly webtool, webMDP, available at http://mdp.sysbio.tools.Entities:
Keywords: gene expression profiling; heterogeneity; infectious diseases; inflammatory diseases; transcriptome analysis
Year: 2019 PMID: 31708960 PMCID: PMC6822058 DOI: 10.3389/fgene.2019.00971
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1The molecular degree of perturbation approach to calculating sample heterogeneity. (A) The MDP algorithm scores samples based on their perturbation from user-defined control samples (often healthy subjects). A Z-score normalization is performed using the control samples as a reference. The absolute values of the normalized scores are then taken, and values below 2 are set to 0. The sample scores are the average of these gene scores for each sample. (B) Running the MDP webtool. Expression and phenotypic files are required to run MDP; the results are a simple barplot and boxplot showing molecular perturbation for each submitted sample. An optional feature allows users to run MDP using a specific gene set, provided as a.gmt file.
Figure 2Removal of potential outlier samples impacts differential expression analyses. (A) Sample MDP scores were calculated for 60 patients with Crohn disease using as a reference group 12 healthy subjects. Data were obtained from whole blood and are available under GEO accession GSE112057. Healthy subjects (blue) were used as reference group. Potential outlier samples are shown as red dots. (B) Differential expression analyses between patients with a disease and healthy controls. Numbers of DEGs before and after removal of potential outlier samples are shown as red and black bars, respectively. Random removal of samples followed by differential expression analysis was performed 1,000 times for each comparison, and the number of DEGs was averaged (black vertical line).
Figure 3Consistency of JIA signatures increases after removal of potential outlier samples. (A) Number of DEGs before and after removal of potential outlier samples in five JIA datasets. The lines show the number of genes (y axis) considered as DEGs in one or more JIA datasets (x axis). (B) Enrichment pathway analysis of genes consistently up- or down-regulated in three or more JIA datasets after removal of potential outlier samples. Bar graph shows the log10 adjusted P value (x axis) of top Gene Ontology gene sets (y axis). (C) Protein–protein interaction network showing the connectivity of up-regulated DEGs in at least three JIA datasets. Genes added to minimum network are shown as gray nodes. Edges were defined by InnateDB (Breuer et al., 2013).
Figure 4Molecular degree of perturbation at single-cell level. (A) Sample MDP score (y axis) and DENV count (VL, x axis) for all single cells infected with DENV. The quadrants divide cells according to their MDP score (cutoff = 1) and/or the VL (cutoff = 3). Each circle represents a single cell, and the gradient color represents the different MOI (multiplicity of infection) and time postinfection. (B) Differential expression analyses between cell subsets shown in A. The numbers of DEGs are indicated in the Venn diagram. (C) Protein–protein interaction network using the DEGs from MDPhighVLhigh versus MDPlowVLlow comparison. Genes related to regulation of cell cycle (blue circles), viral infectious cycle (green circles), and endoplasmic reticulum unfolded protein response (purple circles) are shown.