| Literature DB >> 21603639 |
Richard A Moore1, René L Warren, J Douglas Freeman, Julia A Gustavsen, Caroline Chénard, Jan M Friedman, Curtis A Suttle, Yongjun Zhao, Robert A Holt.
Abstract
Massively parallel sequencing technology now provides the opportunity to sample the transcriptome of a given tissue comprehensively. Transcripts at only a few copies per cell are readily detectable, allowing the discovery of low abundance viral and bacterial transcripts in human tissue samples. Here we describe an approach for mining large sequence data sets for the presence of microbial sequences. Further, we demonstrate the sensitivity of this approach by sequencing human RNA-seq libraries spiked with decreasing amounts of an RNA-virus. At a modest depth of sequencing, viral transcripts can be detected at frequencies less than 1 in 1,000,000. With current sequencing platforms approaching outputs of one billion reads per run, this is a highly sensitive method for detecting putative infectious agents associated with human tissues.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21603639 PMCID: PMC3094400 DOI: 10.1371/journal.pone.0019838
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Flow chart of subtraction methodology.
Paired end reads from a human sequence library are first filtered to remove low quality reads;
Summary of hits to IAdb.
| Library | HaRNAV | Bacteriophage | ||
| Raw read pairs | Rank # | Raw read pairs | Rank # | |
| 1 | 547 | 4 | 2 | 81 |
| 2 | 37 | 17 | 4 | 49 |
| 3 | 6 | 77 | 132 | 19 |
| 4 | 1 | 294 | 2,078 | 7 |
Rank # is the order of the detected IA characterized by decreasing pair counts (i.e. the genome to which most read pairs align ranks #1).
Sequence summary and detection of viral DNA.
| Library | Human RNA (µg) | Viral RNA (pg) | Dilution factor | Read pairs | Total HaRNAV read pairs expected | HaRNAV | Bacteriophage | ||||
| Read pairs | ppm | Expected ppm | Read pairs | ppm | Expected ppm | ||||||
| 1 | 2 | 200 | 1∶10,000 | 20,352,714 | 2,035 | 618 | 30.36 | 100 | 2 | 0.10 | 0.1 |
| 2 | 2 | 20 | 1∶100,000 | 23,334,389 | 233 | 31 | 1.33 | 10 | 3 | 0.13 | 1 |
| 3 | 2 | 2 | 1∶1,000,000 | 22,504,865 | 23 | 6 | 0.27 | 1 | 137 | 6.09 | 10 |
| 4 | 2 | 0.2 | 1∶10,000,000 | 22,224,735 | 2 | 1 | 0.04 | 0.1 | 2,162 | 97.28 | 100 |
*ppm: pairs per million.
Effect of read length and base error on HaRNAV and Herpesvirus 4 detection.
| Conditions | HaRNAV | Herpesvirus-4 | ||
| Read Length (nt) | Error on genome | Number of simulated pairs tested | Mean #pairs detected +− std.dev. | Mean #pairs detected +− std.dev. |
| 76 | 0% | 1000 | 1000.0+−0.0 | 1000.0+−0.0 |
| 76 | 3% | 1000 | 1000.0+−0.0 | 999.3+−1.3 |
| 76 | 5% | 1000 | 979.0+−12.0 | 973.0+−4.0 |
| 36 | 3% | 1000 | 1000.0+−0.0 | 1000.0+−0.0 |
| 50 | 3% | 1000 | 1000.0+−0.0 | 999.3+−1.3 |
| 100 | 3% | 1000 | 985.7+−2.1 | 989.7+−2.5 |
Figure 2Circos [16]plot detailing HaRNAV sequence recovery.
The red and blue lines represent reads aligning on the minus and plus strand, respectively. The Heterosigma akashiwo RNA virus has an 8,587 bp ss-RNA linear genome with a single CDS, shown in green on the circos plot. The read depth of coverage is shown in the centre of the plot. The genome is depicted by alternating black-white arcs of 500 bp in size.
Effect of IAdb entry removal on viral sequence detection.
| Conditions | HaRNAV | Herpesvirus 4 | ||
| Read Length (nt) | Error on genome | Number of simulated pairs tested | Mean #pairs detected | Mean #pairs detected +− std.dev. |
| 76 | 3% | 1 | 0.0+−0.0 | 1.0+−0.0** |
| 10 | 0.0+−0.0 | 8.7+−1.5 | ||
| 100 | 0.0+−0.0 | 90.3+−5.0 | ||
| 1000 | 0.0+−0.0 | 978.3+−3.5 | ||
*Upon depletion of HaRNAV and Herpesvirus 4, respectively and listing pairs that hit other entries. ** Pairs map unambiguously to AY961628.3, Human herpesvirus 4 strain GD1 and/or NC_007605.1, Human herpesvirus 4 type 1. Missing pairs were neither subtracted at the human sequence screening phase, nor mapped to known viral/bacterial entries and may characterize uniquely the human herpesvirus 4 genome, accession NC_009334.1.