| Literature DB >> 25412476 |
Michael J Strong1, Guorong Xu2, Lisa Morici3, Sandra Splinter Bon-Durant4, Melody Baddoo1, Zhen Lin1, Claire Fewell1, Christopher M Taylor5, Erik K Flemington1.
Abstract
The high level of accuracy and sensitivity of next generation sequencing for quantifying genetic material across organismal boundaries gives it tremendous potential for pathogen discovery and diagnosis in human disease. Despite this promise, substantial bacterial contamination is routinely found in existing human-derived RNA-seq datasets that likely arises from environmental sources. This raises the need for stringent sequencing and analysis protocols for studies investigating sequence-based microbial signatures in clinical samples.Entities:
Mesh:
Year: 2014 PMID: 25412476 PMCID: PMC4239086 DOI: 10.1371/journal.ppat.1004437
Source DB: PubMed Journal: PLoS Pathog ISSN: 1553-7366 Impact factor: 6.823
Bacterial profile among various human RNA-seq datasets.
| TCGA | BodyMap | CRC | ||
| Normal | Tumor | |||
|
| 773,345±6,104 | 883,349±3,309 | 757,775±8,420 | 757,466±8,640 |
|
| 1,406.0±100 | 1,789.0±242 | 11,106.0±3,430 | 9,517.0±3,489 |
|
| 1.1±0.1 | 1.3±0.2 | 4.2±1.2 | 7.8±1.8 |
|
| 6.4±2.6 | 0.0±0.0 | 53.0±29.0 | 861.0±491 |
|
| 396.0±35 | 859.0±201 | 1.6±0.7 | 1.1±0.63 |
|
| 16.0±3.9 | 14.0±3.4 | 164.0±22 | 360.0±69 |
|
| 6.1±0.5 | 3.0±0.5 | 2,232.0±393 | 1,788.0±322 |
|
| 668.0±94 | 689.0±166 | 166.0±75 | 191.0±74 |
The average of five RNA-seq datasets (File S1) represent values for TCGA. Similarly, the average of thirteen RNA-seq datasets (File S2) represent values for BodyMap. Colorectal (CRC) RNA-seq datasets were obtained from Castellarin et al. accession number SRP007584 (File S3). All values shown as mean±SEM.
Figure 1Seven RNA-seq DLBCL cell line datasets sequenced in two different studies (CCLE and CGCI) were analyzed using RNA CoMPASS.
(A) Bacterial reads per human mapped reads. For insets, human and ribosomal reads are normalized to total reads. Green columns represent the average RNA-seq reads from the CCLE dataset, while red columns represent the average RNA-seq reads from the CGCI dataset. (B) Mean bacterial RPMHs for each cell line analyzed in the CCLE (green) and CGCI (red) studies with the corresponding mean ribosomal reads (upper graph). (C) Mean RPMHs of various taxa for each cell line analyzed in the CCLE (green) and CGCI (red) studies. *, p<0.05.
Figure 2Metatranscriptomic profiles of five RNA sequencing datasets vary across laboratories.
Five lymphoblastoid cell line (LCL) RNA-seq datasets, sequenced at six sequencing centers across Europe, were analyzed using RNA CoMPASS. Various classification groups within the bacteria domain for each sample were compared across sequencing centers (A) bacteria, (B) Actinobacteria, (C) Firmicutes, (D) environmental samples, and (E) Proteobacteria. (F) As a control, Epstein-Barr Virus (EBV) read numbers were also analyzed. All reads are normalized to million mapped human reads. The five LCL RNA samples are represented by unique respective colors. *, P<0.05; **, P<0.01; ***, P<0.001; ****, P<0.0001.