| Literature DB >> 27822544 |
Michael R Wilson1, Greg Fedewa2, Mark D Stenglein3, Judith Olejnik4, Linda J Rennick4, Sham Nambulli4, Friederike Feldmann5, W Paul Duprex4, John H Connor4, Elke Mühlberger4, Joseph L DeRisi6.
Abstract
Laboratories studying high-priority pathogens need comprehensive methods to confirm microbial species and strains while also detecting contamination. Metagenomic deep sequencing (MDS) inventories nucleic acids present in laboratory stocks, providing an unbiased assessment of pathogen identity, the extent of genomic variation, and the presence of contaminants. Double-stranded cDNA MDS libraries were constructed from RNA extracted from in vitro-passaged stocks of six viruses (La Crosse virus, Ebola virus, canine distemper virus, measles virus, human respiratory syncytial virus, and vesicular stomatitis virus). Each library was dual indexed and pooled for sequencing. A custom bioinformatics pipeline determined the organisms present in each sample in a blinded fashion. Single nucleotide variant (SNV) analysis identified viral isolates. We confirmed that (i) each sample contained the expected microbe, (ii) dual indexing of the samples minimized false assignments of individual sequences, (iii) multiple viral and bacterial contaminants were present, and (iv) SNV analysis of the viral genomes allowed precise identification of the viral isolates. MDS can be multiplexed to allow simultaneous and unbiased interrogation of mixed microbial cultures and (i) confirm pathogen identity, (ii) characterize the extent of genomic variation, (iii) confirm the cell line used for virus propagation, and (iv) assess for contaminating microbes. These assessments ensure the true composition of these high-priority reagents and generate a comprehensive database of microbial genomes studied in each facility. MDS can serve as an integral part of a pathogen-tracking program which in turn will enhance sample security and increase experimental rigor and precision. IMPORTANCE Both the integrity and reproducibility of experiments using select agents depend in large part on unbiased validation to ensure the correct identity and purity of the species in question. Metagenomic deep sequencing (MDS) provides the required level of validation by allowing for an unbiased and comprehensive assessment of all the microbes in a laboratory stock.Entities:
Keywords: metagenomics; pathogen tracking; phylogenetic analysis
Year: 2016 PMID: 27822544 PMCID: PMC5069959 DOI: 10.1128/mSystems.00058-16
Source DB: PubMed Journal: mSystems ISSN: 2379-5077 Impact factor: 6.496
FIG 1 Dual indexing decreases the median rate of read misassignment by nearly 40-fold. Libraries from 50 samples were pooled and sequenced together. Two of the samples (indicated by arrows) were positive for snake arenavirus, and the other 48 were negative. Data sets were demultiplexed using a single index sequence (black squares) or dual index sequences (red circles), and reads from each data set were mapped with high stringency to the virus sequence. The number of virus mapping reads per million quality-filtered reads is indicated. Some dual-index-demultiplexed data sets had no misassigned reads. In these cases, red circles are not shown.
Metagenomic deep sequencing results
| Sample | Total no. of sequencing reads | No. of unique nonhuman reads | Target virus | Other viral sequence(s) | Bacterial sequence |
|---|---|---|---|---|---|
| EBOV | 47,328,387 | 17,777,406 | EBOV (3,342,624) | None | |
| rMV | 7,129,497 | 802,549 | MV (159,184) | HHV4 (905) | Negative |
| rHRSV | 6,274,384 | 1,165,688 | HRSV (673,096) | HPV18 (121) | Negative |
| LACV | 6,217,368 | 1,389,113 | LACV (1,122,903) | LCMV (5,388), Syrian hamster IAP H10 (45), hamster gammaretrovirus (11) | |
| rCDV | 6,165,559 | 900,609 | CDV (582,074) | Fowlpox (1,138), LACV (7) | Negative |
| VSV | 10,164,946 | 2,130,521 | VSV (1,481,456) | LCMV (3,315) | Negative |
Number of sequences aligning to each microbe are in parentheses. Abbreviations: EBOV, Ebola virus; rMV, recombinant measles virus; HHV4, human herpesvirus 4; rHRSV, recombinant human respiratory syncytial virus; LACV, La Crosse virus; HPV18, human papillomavirus 18; IAP, intracisternal A particle; LCMV, lymphocytic choriomeningitis virus; rCDV, recombinant canine distemper virus; VSV, vesicular stomatitis virus.
SNVs in each virus genome segment that are at least 0.5% of population
| Viral genome | Mean coverage | No. of SNVs: | ||||
|---|---|---|---|---|---|---|
| Total | ≥1% | ≥10% | ≥50% | ≥90% | ||
| EBOV | 8,334 | 143 (70) | 49 (19) | 3 (1) | 2 (1) | 1 (1) |
| MV | 1,622 | 122 (80) | 72 (42) | 13 (2) | 11 (1) | 11 (1) |
| HRSV | 5,429 | 115 (71) | 54 (27) | 2 (1) | 2 (1) | 2 (1) |
| LACV L segment | 5,539 | 61 (45) | 27 (18) | 4 (1) | 3 (0) | 2 (0) |
| LACV M segment | 10,077 | 31 (21) | 11 (8) | 6 (4) | 0 (0) | 0 (0) |
| LACV S segment | 14,906 | 6 (4) | 4 (2) | 1 (0) | 1 (0) | 1 (0) |
| CDV | 4,741 | 427 (170) | 297 (91) | 30 (6) | 3 (0) | 1 (0) |
| VSV | 13,401 | 109 (72) | 53 (31) | 24 (10) | 24 (10) | 24 (10) |
Mean coverage is the mean number of non-PCR-duplicate reads that mapped to each base of the virus genome segment. Each SNV column is listed as the number of total SNVs, followed by the number of nonsynonymous SNVs in parentheses, which are at least the percentage of the population listed in the header. Abbreviations: EBOV, Ebola virus; MV, measles virus; HRSV, human respiratory syncytial virus; LACV, La Crosse virus; CDV, canine distemper virus; VSV, vesicular stomatitis virus.
FIG 2 Distribution of single nucleotide variants (SNVs) for each virus and for two Ebola virus strains. For each plot, the y axis is the log10(SNV frequency) in order to display the range of low-frequency SNVs more accurately. (A) SNV analysis for each virus sample using its reference genome. Abbreviations: CDV, canine distemper virus; EBOV, Ebola virus; LACV, La Crosse virus; MV, measles virus; HRSV, human respiratory syncytial virus; VSV, vesicular stomatitis virus. (B) SNV analysis of 2,501,8050 filtered read-pairs revealed 223 SNVs in the reads that mapped to Ebola virus Mayinga with a frequency of ≥0.90, while there was only one SNV with a frequency of ≥0.90 in the reads that mapped to Ebola virus Kikwit.