| Literature DB >> 34252963 |
Yancong Zhang1,2,3, Kelsey N Thompson1,2,3, Curtis Huttenhower1,2,3,4, Eric A Franzosa1,2,3.
Abstract
MOTIVATION: Metatranscriptomics (MTX) has become an increasingly practical way to profile the functional activity of microbial communities in situ. However, MTX remains underutilized due to experimental and computational limitations. The latter are complicated by non-independent changes in both RNA transcript levels and their underlying genomic DNA copies (as microbes simultaneously change their overall abundance in the population and regulate individual transcripts), genetic plasticity (as whole loci are frequently gained and lost in microbial lineages) and measurement compositionality and zero-inflation. Here, we present a systematic evaluation of and recommendations for differential expression (DE) analysis in MTX.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34252963 PMCID: PMC8275336 DOI: 10.1093/bioinformatics/btab327
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Normalization and DE models for microbial community transcript abundances. (A) Each panel corresponds to a simple conceptual community (two species, A and B, each contributing two genes) assayed by MTX and MGX sequencing. Case 0 represents a reference condition, while Cases 1–4 correspond to perturbations of the reference. While the RNA abundance of gene B:4 (dashed outline) differs under each perturbation, only in Case 4 is the change attributable to DE rather than gene copy-number variation. (B) A summary of six linear models for assaying DE of a MTX feature f with respect to a sample phenotype/property p. Models 2–6 incorporate transformations and covariates aimed at minimizing spurious DE signals from gene copy number
Properties of 11 synthetic datasets (columns) used in community DE benchmarking
| null | null- bug | null- enc | null- dep | true- exp | true-exp- med | true-exp- low | true-combo- bug-exp | true-combo- dep-exp | group-null- enc | group- true-exp | |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
| –– | –– | –– | –– |
|
|
|
|
| –– |
|
|
| –– | –– | –– | –– |
|
|
|
|
| –– |
|
|
| –– |
| –– | –– | –– | –– | –– |
| –– | –– | –– |
|
| –– | –– |
| –– | –– | –– | –– | –– | –– |
| –– |
|
| –– | –– | –– |
| –– | –– | –– | –– |
| –– | –– |
|
| –– | –– | –– | –– | –– | –– | –– | –– | –– |
|
|
|
| |||||||||||
Notes: Datasets of N = 100 paired MTX and MGX samples based on healthy human gut microbiomes were spiked to include associations between a synthetic case: control phenotype and some combination of sequencing depth, microbial species, gene presence/absence and gene expression. Expression-phenotype associations are DE positives, while other associations may introduce spurious (FP) signals of DE. ‘––’ implies 0% or ‘n/a.’ These datasets are described further in Supplementary Table S1.
Fig. 2.DE models for MTX accounting for underlying variation in gene copy number control FP rates while maintaining statistical power. We evaluated the performance of six models for DE in MTX data (M1-6; Fig. 1B) on nine synthetic datasets with paired MTX and MGX measurements of known taxonomy (Table 1). ‘Null’ datasets (top row) contained no positive DE signatures and were evaluated in comparison with the theoretical nominal type-1 error rate only (dashed lines; FPR = 0.05). ‘True’ datasets contained 10% positive DE relationships and were evaluated on the basis of their sensitivity (accounting for multiple hypothesis correction; middle row) versus nominal type-I error rate (bottom row). Error bars reflect the 95% CI for percentages
Fig. 3.DE models for MTX benefit from pre-filtering of probable technical zeros. We assessed three pre-filtering strategies for balancing biological and technical zeros in MTX data. (A) Under ‘lenient’ pre-filtering, features were analyzed for DE if they were ever detected (non-zero) at the RNA level (or gene-copy level, where applicable). (B) Under ‘semi-strict’ pre-filtering, samples were excluded if both a feature’s RNA count and gene-copy estimate (where applicable) were zero. (C) Under ‘strict’ pre-filtering, samples were excluded if either a feature’s RNA count or gene-copy estimate were zero. Features that were excluded from analysis were scored as ‘not DE’ (i.e. negatives). Gray cells indicate ‘undefined’ TPR for datasets lacking positive DE signals
Fig. 4.Adjusting RNA for DNA gene copy number provides consistently higher performance in DE analysis for communities containing unknown taxonomy. We assessed the performance of community DE models that do not require a mapping of MTX features to taxa (M1, M4 and M6) on communities of unknown taxonomy. One of these community datasets (‘group-null-enc’) was spiked with confounding gene presence/absence signals, while a second (‘group-true-exp’) contained positive DE signals. Spikes were generated at the gene family (orthogroup) level. Each method was evaluated in combination with three pre-filtering schemes for managing zero inflation (Supplementary Fig. S1). Error bars reflect the 95% CI for percentages
Fig. 5.DE of E.coli pilin-like proteins in the IBD gut microbiome. We previously prioritized 113 pilin-family proteins assigned to E.coli for roles in IBD-associated inflammation based on their MGX properties. We surveyed these proteins for differential functional activity during inflammation using three models of community DE that performed well during synthetic evaluations (Fig. 2). (A) The feature-DNA covariate model (M6) identified 16 genes with significantly elevated expression among dysbiotic samples (FDR q < 0.25 as emphasized in bold text). (B) The taxon-RNA covariate model (M3) tended to agree with these trends in sign but not statistical significance, while (C) the RNA/DNA ratio model (M4) tended to identify these trends as significant in the opposite direction. M3- and M4-specific trends are further compared in Supplementary Figure S5