| Literature DB >> 35862820 |
Kieran Tebben1,2, Aliou Dia1, David Serre1,2.
Abstract
Malaria symptoms are caused by the development of the parasites within the blood of an infected host. Bulk RNA sequencing (RNA-seq) of infected blood can reveal interactions between parasites and the host immune system during an infection, but because multiple developmental stages with distinct transcriptional profiles are concurrently present in infected blood, it is necessary to correct such analyses for differences in cell composition among samples. Gene expression deconvolution is a statistical approach that has been developed for inferring the cell composition of complex tissues characterized by bulk RNA-seq using gene expression profiles from reference cell types. Here, we describe the evaluation of a species-agnostic reference data set that can be used for efficient and accurate gene expression deconvolution of bulk RNA-seq data generated from any Plasmodium species and for correct gene expression analyses for biases caused by differences in stage composition among samples. IMPORTANCE Differences in cell type proportions among samples can introduce artifacts in gene expression analyses and mask genuine differences in gene regulation. Gene expression deconvolution allows estimation of the proportion of each cell type present in one sample directly from bulk RNA sequencing data, but this approach requires a reference data set with the signature profile of each cell type. Here, we evaluate the suitability of a rodent malaria parasite gene expression data set for estimating the proportions of each parasite developmental stage present in bulk RNA sequencing data generated from blood-stage infections with the human parasites Plasmodium falciparum and Plasmodium vivax. These analyses provide a species-agnostic approach for reliably estimating stage proportions in infected human blood and correcting subsequent gene expression analyses for these variations.Entities:
Keywords: Plasmodium; RNA-seq; gene expression deconvolution; transcriptomics
Year: 2022 PMID: 35862820 PMCID: PMC9426464 DOI: 10.1128/msystems.00258-22
Source DB: PubMed Journal: mSystems ISSN: 2379-5077 Impact factor: 7.324
FIG 1scRNA-seq data can reliably deconvolute bulk RNA-seq data. Each vertical bar represents one RNA-seq experiment generated from a synchronized P. falciparum in vitro culture (organized chronologically along the x axis) and is colored according to the proportion of transcripts derived from Ring (red), Trophozoite (green), Schizont (blue), Female gametocytes (black), and Male gametocytes (gray), as determined by CIBERSORTx. (A) Deconvolution of P. falciparum bulk RNA-seq data using all 4,923 P. falciparum genes of the scRNA-seq data to generate the signature matrix. (B) Deconvolution of the same P. falciparum bulk data using only the 3,391 P. falciparum genes with 1:1:1 orthologs to generate the signature matrix. (C) Deconvolution of the same P. falciparum bulk data using the 3,509 P. berghei genes with 1:1:1 orthologs from a scRNA-seq data set to generate the signature matrix.
FIG 2A species-agnostic reference matrix can be used to deconvolute P. vivax bulk mixtures, as shown by this deconvolution of P. vivax mock bulk data with a signature matrix generated from P. berghei scRNA-seq single-cell expression profiles. Cells were binned into equal bins by pseudotime to represent progression from early trophozoites (bin 1) to late schizonts (bin 5) along the x axis.
FIG 3Accuracy of gene expression deconvolution using a species-agnostic signature matrix. Each plot displays the estimated proportion of a specific P. berghei stage each cell type (y axis) in simulated mixtures compared to the true proportions (x axis).