| Literature DB >> 33926528 |
Matthew Chung1,2, Vincent M Bruno1,2, David A Rasko1,2, Christina A Cuomo3, José F Muñoz3, Jonathan Livny3, Amol C Shetty1, Anup Mahurkar1, Julie C Dunning Hotopp4,5,6.
Abstract
Advances in transcriptome sequencing allow for simultaneous interrogation of differentially expressed genes from multiple species originating from a single RNA sample, termed dual or multi-species transcriptomics. Compared to single-species differential expression analysis, the design of multi-species differential expression experiments must account for the relative abundances of each organism of interest within the sample, often requiring enrichment methods and yielding differences in total read counts across samples. The analysis of multi-species transcriptomics datasets requires modifications to the alignment, quantification, and downstream analysis steps compared to the single-species analysis pipelines. We describe best practices for multi-species transcriptomics and differential gene expression.Entities:
Keywords: Best practices; Differential gene expression; RNA-Seq; Transcriptomics
Mesh:
Substances:
Year: 2021 PMID: 33926528 PMCID: PMC8082843 DOI: 10.1186/s13059-021-02337-8
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1A general workflow for the enrichment, library preparation, and sequencing steps of a typical multi-species RNA-Seq analysis. Created with BioRender.com
Fig. 2A general workflow for the read processing, alignment, and quantification steps of a typical multi-species RNA-Seq analysis. Created with BioRender.com
Fig. 3Examples of saturation curves for two samples that reach saturation and two samples that do not reach saturation
Fig. 4A general workflow showing examples of downstream analyses for a typical multi-species RNA-Seq analysis. Created with BioRender.com
Differential expression analysis comparing gene vs. transcript models using simulated data
| Quantification method | DESeq2 DE genes detected | DESeq2 DE transcripts detected | DESeq2 false positive DE genes | DESeq2 false positive DE transcripts | edgeR DE genes detected | edgeR DE transcripts detected | edgeR false positive DE genes | edgeR false positive DE transcripts |
|---|---|---|---|---|---|---|---|---|
| FADU | 684 (54.9%) | 440 (79.1%) | 7 (0.56%) | 11 (2.0%) | 947 (76.0%) | 439 (79.0%) | 34 (2.7%) | 12 (2.2%) |
| FADU -em_iterations 10 | 686 (55.1%) | 438 (78.8%) | 10 (0.8%) | 12 (2.2%) | 955 (76.7%) | 436 (78.4%) | 37 (3.0%) | 12 (2.2%) |
| FADU -remove_multimapped | 702 (56.3%) | 435 (78.2%) | 4 (0.32%) | 5 (0.9%) | 974 (78.2%) | 447 (80.4%) | 32 (2.6%) | 9 (1.6%) |
| featureCounts | 694 (55.7%) | 434 (78.1%) | 5 (0.4%) | 5 (0.9%) | 936 (75.1%) | 441 (79.3%) | 24 (1.9%) | 10 (1.8%) |
| featureCounts -O | 775 (62.2%) | 515 (92.6%) | 18 (1.44%) | 54 (9.7%) | 1008 (80.9%) | 522 (93.9%) | 43 (3.5%) | 61 (11.0%) |
| featureCounts -O -fraction | 734 (58.9%) | 505 (90.8%) | 14 (1.12%) | 20 (3.6%) | 1000 (80.3%) | 528 (95.0%) | 47 (3.8%) | 44 (7.9%) |
| HTSeq -m union | 644 (51.7%) | 428 (77.0%) | 4 (0.32%) | 5 (0.9%) | 909 (73.0%) | 436 (78.4%) | 40 (3.2%) | 12 (2.2%) |
| HTSeq -m intersection-strict | 607 (48.7%) | 436 (78.4%) | 1 (0.08%) | 5 (0.9%) | 803 (64.5%) | 442 (79.5%) | 24 (1.9%) | 14 (2.5%) |
| HTSeq -m intersection-nonempty | 656 (52.7%) | 436 (78.4%) | 3 (0.24%) | 5 (0.9%) | 903 (72.5%) | 442 (79.5%) | 31 (2.5%) | 14 (2.5%) |
| HTSeq -m union -nonunique all | 769 (61.7%) | 509 (91.6%) | 18 (1.44%) | 48 (8.6%) | 1005 (80.7%) | 519 (93.4%) | 47 (3.8%) | 65 (11.7%) |
| kallisto | 675 (54.2%) | 526 (94.6%) | 9 (0.72%) | 11 (2.0%) | 946 (75.9%) | 532 (95.7%) | 41 (3.3%) | 22 (4.0%) |
| Salmon -validateMappings | 676 (54.3%) | 525 (94.4%) | 4 (0.32%) | 8 (1.4%) | 946 (75.9%) | 534 (96.0%) | 44 (3.5%) | 25 (4.5%) |
| Salmon -validateMappings -allowDovetail | 675 (54.2%) | 525 (94.4%) | 4 (0.32%) | 9 (1.6%) | 946 (75.9%) | 534 (96.0%) | 46 (3.7%) | 23 (4.1%) |