| Literature DB >> 29695497 |
Hyun Jae Lee1, Athina Georgiadou2, Thomas D Otto3, Michael Levin2, Lachlan J Coin1, David J Conway4, Aubrey J Cunnington5.
Abstract
Transcriptomics, the analysis of genome-wide RNA expression, is a common approach to investigate host and pathogen processes in infectious diseases. Technical and bioinformatic advances have permitted increasingly thorough analyses of the association of RNA expression with fundamental biology, immunity, pathogenesis, diagnosis, and prognosis. Transcriptomic approaches can now be used to realize a previously unattainable goal, the simultaneous study of RNA expression in host and pathogen, in order to better understand their interactions. This exciting prospect is not without challenges, especially as focus moves from interactions in vitro under tightly controlled conditions to tissue- and systems-level interactions in animal models and natural and experimental infections in humans. Here we review the contribution of transcriptomic studies to the understanding of malaria, a parasitic disease which has exerted a major influence on human evolution and continues to cause a huge global burden of disease. We consider malaria a paradigm for the transcriptomic assessment of systemic host-pathogen interactions in humans, because much of the direct host-pathogen interaction occurs within the blood, a readily sampled compartment of the body. We illustrate lessons learned from transcriptomic studies of malaria and how these lessons may guide studies of host-pathogen interactions in other infectious diseases. We propose that the potential of transcriptomic studies to improve the understanding of malaria as a disease remains partly untapped because of limitations in study design rather than as a consequence of technological constraints. Further advances will require the integration of transcriptomic data with analytical approaches from other scientific disciplines, including epidemiology and mathematical modeling.Entities:
Keywords: RNA sequencing; apicomplexan parasites; host-parasite relationship; host-pathogen interactions; immune response; malaria; pathogenesis; transcriptomics
Mesh:
Year: 2018 PMID: 29695497 PMCID: PMC5968457 DOI: 10.1128/MMBR.00071-17
Source DB: PubMed Journal: Microbiol Mol Biol Rev ISSN: 1092-2172 Impact factor: 11.056
FIG 1Main interactions with human tissues and cells during the Plasmodium falciparum developmental cycle. (1) Infection is initiated by a mosquito bite. Motile sporozoites rapidly find their way past the structural and immune cells in the skin into blood vessels and onwards to the liver. The short transit time limits opportunities for cellular interactions. (2) Sporozoites reach the liver, exit the vasculature through Kupffer cells, and then undergo massive replication in hepatocytes. Immune cells such as CD8 T cells patrol the liver and may detect and kill infected hepatocytes. (3) Parasites burst out of hepatocytes and enter the bloodstream, rapidly infecting erythrocytes. They undergo repeated cycles of asexual replication, interacting with blood leukocytes. Parasite products are carried throughout the systemic circulation, triggering inflammatory responses. (4) Parasitized red cells may be cleared by the spleen, which is a major location for the host immune response to Plasmodium. (5) Parasites may exit the asexual erythrocytic cycle to produce gametocytes, which may be taken up by another mosquito bite to allow onward transmission. Gametocytogenesis may be influenced by the host response, and most gametocyte development occurs in the bone marrow. Mature gametocytes reenter and circulate in the blood, potentially interacting with host leukocytes and the vascular endothelium. (6) Parasites can cause severe disease if they accumulate (sequester) and obstruct the microvasculature of vital organs, such as the brain. There may be both direct and indirect interactions with the vascular endothelium, leukocytes, and parenchymal cells.
FIG 2Timeline of transcriptomic approaches for infectious diseases. Transcriptomic analysis requires the extraction of RNA from parts of the body such as peripheral blood. Methods of analysis have evolved over time. Serial analysis of gene expression (SAGE) utilizes the Sanger sequencing approach to generate and sequence short (∼11-nucleotide) tags and quantify transcript abundance. It is expensive and low throughput. Massively parallel signature sequencing (MPSS) generates slightly longer tags (∼17 to 20 nucleotides) and provides a larger library size. Cap analysis of gene expression (CAGE) is similar in principle to SAGE but targets transcription start sites. Microarray analysis is a hybridization approach that uses fluorescence-tagged probes to target transcripts of interest. RNA sequencing (RNA-seq) is a high-throughput sequencing approach capable of novel transcript discovery, noncoding RNA analysis, and alternative splicing analysis. RNA-seq has been developed to allow transcriptomic analysis at a single-cell resolution, simultaneous analysis of host and pathogen transcriptomes (dual RNA-seq), and sequencing of full-length transcripts to allow detailed analysis of transcript isoforms and direct analysis of RNA. In the future, techniques such as laser capture microdissection (LCM) may be coupled with RNA-seq to allow host cells and their interacting pathogens (such as parasites adhering to vascular endothelial cells) to be isolated and studied as defined cell groups or dual single-cell analyses. Massively parallel single-cell analyses and direct RNA-seq are also on the horizon.
Comparison of microarray and RNA sequencing technologies
| Feature | Description | Reference(s) | |
|---|---|---|---|
| Microarray | RNA sequencing | ||
| Technology | Tens of thousands of DNA probes specific for each transcript of interest within microwells on a chip; capture of cDNA produces a fluorescent signal proportional to target abundance | Next-generation sequencing of many millions of cDNA fragments | |
| Reference genome dependent | Yes; probes must be designed for specific targets | No; | |
| Sample requirement(s) | Good-quality, nondegraded RNA | Good-quality, nondegraded RNA but possible with small amounts and partially degraded samples; rRNA depletion is usually required | |
| Dynamic range for transcript detection | Fixed | Unlimited but influenced by sequencing read depth | |
| Novel transcript detection | No | Yes | |
| Noncoding RNA detection | Yes | Yes | |
| Splice detection | Yes but limited to known splice sites | Yes | |
| Dual host-pathogen transcript detection | Yes but requires custom arrays designed for both transcriptomes | Yes; relative amounts of RNA determine detection limits | |
| Quantification of gene expression | Analogue | Digital | |
| Analysis | Widely accessible with user-friendly software packages | More challenging; often requires a bioinformatician | |
| Cost | Relatively cheap overall; predictable cost per sample | Generally more expensive; library prepn costs accumulate per sample; sequencing costs accumulate per lane (into which samples may be multiplexed) | |
| Level of technical variability | Higher | Lower | |
| Sample size calculation method | Standard tools for calculation | Complex tradeoff between sample numbers, read depth, and cost | |
| Presence of batch effects | Sometimes problematic between chips; very problematic between different arrays; well-developed software tools can compensate for batch effects to some extent | Mainly during library prepn; software tools to compensate for batch effect are evolving | |
| Types of technical bias | Background and cross-hybridization bias; dye bias arising from samples labeled with different dyes | Length bias arising from differences in transcript lengths between genes and GC content bias, with uneven coverage in GC-poor or GC-rich regions | |
FIG 3Transcriptomic host response to malaria in humans and mice. Shown is a comparison of selected features of the transcriptional host response to malaria in humans (left) and mice (right). Pie charts indicate how frequently each tissue has been studied for each species. Broad functional groups of genes are presented as upregulated (light red) or downregulated (light blue) in each species for uncomplicated disease and for cerebral malaria (CM) or experimental cerebral malaria (ECM). Common functional groups between humans and mice are marked in boldface type.
Lessons for the design of infectious disease transcriptomic studies
| Phase | Challenge | Problem | Example | Solutions |
|---|---|---|---|---|
| Design | Defining study objective | Use of principal-component analysis plots to identify groups that differ the most and then selection of these groups for further analysis | Prespecify objectives; prespecify hypothesis; prespecify analysis plan | |
| Bias | Systematic differences between comparison groups in sample selection, data availability, assessment, or analysis; cannot be eliminated statistically | Selection of control subjects from a different country to cases | Select all samples from the same representative population; perform blind assessment of samples; perform blind data analysis | |
| Confounding | Factors associated with the outcome of interest may influence the association between gene expression and outcome | Likelihood of symptomatic malaria or asymptomatic parasitemia is influenced by age, prior exposure, and parasite load | Collect data on known confounders to allow statistical adjustment; match for known confounders; perform within-subject comparisons to eliminate unknown confounders | |
| Generalizability | Statistically significant results can be obtained from small sample sizes but may not be representative of the population of interest | Comparison of groups of 3 inbred mice may be representative of infection in those mice, but data from similar-sized studies with humans almost certainly will not be representative | Use sufficient biological replicates to capture the diversity of the population; replicate findings with independent samples; replicate findings with alternative methods; replicate findings in independent population(s)/exptl models | |
| Causal inference | Observational studies cannot prove that variation in gene expression causes the outcome of interest, but evidence for a causal association may be strengthened by certain features of study design | Higher type I interferon response gene expression levels in uncomplicated malaria than in severe malaria (Is it protective?) | Compare graded outcomes for gene expression “dose-response” relationships; use Mendelian randomization designs; assess during exptl study or interventional clinical trial | |
| Sampling | Isolation of host and pathogen RNAs | Both species may be present in the sample, but the ratio of host/pathogen RNA can determine success of the expt | Human and | Estimate likely amounts of host and pathogen RNAs to assess feasibility prior to sampling; consider focused sampling to increase the ratio of pathogen/host cells; consider enrichment of pathogen RNA |
| Analysis | Cellular heterogeneity | Differences in cellular compositions of samples can dominate transcriptional profiles | Infection often changes proportions of different leukocyte populations in blood | Apply deconvolution algorithm to adjust for differences in cell mixture in bulk transcriptome; use flow cytometry analysis to directly determine cellular composition; purify cell types of interest before transcriptome analysis; use single-cell RNA sequencing |
| Adjustment for confounders | Statistical adjustment is needed if matching for confounders cannot be achieved | Adjustment for parasite load in comparison of cases of severe and uncomplicated malaria | Logistic and linear regression models can be implemented in many transcriptomic analysis tools | |
| Discovery of optimal biomarkers | Expression of many genes may be associated with outcome, but it is difficult to select the smallest combination with the best out-of-sample prediction | Which combination of transcripts best predicts clinical deterioration in a child with uncomplicated malaria | Apply variable selection algorithms to a training data set; confirm with a separate test data set; validate with at least 1 external data set | |
| Reporting | Maximize reuse of data | Provision of metadata allows maximum future reuse of transcriptomic data | Complete subject- or sample-level data allow secondary analyses to be performed | Observe community standards for reporting; make metadata publicly available |