| Literature DB >> 28662150 |
Samuele Bovo1,2, Gianluca Mazzoni1,3, Anisa Ribani1, Valerio Joe Utzeri1, Francesca Bertolini1,4, Giuseppina Schiavo1, Luca Fontanesi1.
Abstract
Shot-gun next generation sequencing (NGS) on whole DNA extracted from specimens collected from mammals often produces reads that are not mapped (i.e. unmapped reads) on the host reference genome and that are usually discarded as by-products of the experiments. In this study, we mined Ion Torrent reads obtained by sequencing DNA isolated from archived blood samples collected from 100 performance tested Italian Large White pigs. Two reduced representation libraries were prepared from two DNA pools constructed each from 50 equimolar DNA samples. Bioinformatic analyses were carried out to mine unmapped reads on the reference pig genome that were obtained from the two NGS datasets. In silico analyses included read mapping and sequence assembly approaches for a viral metagenomic analysis using the NCBI Viral Genome Resource. Our approach identified sequences matching several viruses of the Parvoviridae family: porcine parvovirus 2 (PPV2), PPV4, PPV5 and PPV6 and porcine bocavirus 1-H18 isolate (PBoV1-H18). The presence of these viruses was confirmed by PCR and Sanger sequencing of individual DNA samples. PPV2, PPV4, PPV5, PPV6 and PBoV1-H18 were all identified in samples collected in 1998-2007, 1998-2000, 1997-2000, 1998-2004 and 2003, respectively. For most of these viruses (PPV4, PPV5, PPV6 and PBoV1-H18) previous studies reported their first occurrence much later (from 5 to more than 10 years) than our identification period and in different geographic areas. Our study provided a retrospective evaluation of apparently asymptomatic parvovirus infected pigs providing information that could be important to define occurrence and prevalence of different parvoviruses in South Europe. This study demonstrated the potential of mining NGS datasets non-originally derived by metagenomics experiments for viral metagenomics analyses in a livestock species.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28662150 PMCID: PMC5491021 DOI: 10.1371/journal.pone.0179462
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Flowchart of the bioinformatic analyses for identification of viral sequences from unmapped reads by using read mapping and sequence assembly approaches.
a) Steps adopted in the virus discovery: after the preprocessing phase, reads were analyzed by using both a sequence assembly approach (yellow) and a read mapping approach (green). b) The flowchart proposed in box “a” is presented highlighting the number of reads obtained in each step of the analysis.
PCR primers and PCR conditions of the amplified fragments of the detected viruses.
| Primer pair name/Virus | Primers (5’-3’) | PCR conditions | Expected Amplified region (bp) | Use |
|---|---|---|---|---|
| PPV2 | Forward: | 54/1.5 | 397 | PCR/Sequencing |
| PPV4 | Forward: | 58/1.5 | 250 | PCR/Sequencing |
| PPV5 | Forward: | 54/1.5 | 351 | PCR/Sequencing |
| PPV6 | Forward: | 58/1.5 | 383 | PCR/Sequencing |
| PBoV1-H18_317-616nt | Forward: | 58/1.5 | 270 | PCR/Sequencing |
| PBoV1-H18_1-316nt_317-616nt | Forward: | 58/1.5 | 226 | PCR/Sequencing |
| PCV2_flank | Forward: | 59/1.5 | 429 | PCR |
| PCV2_ovlp | Forward: | 59/1.5 | 263 | PCR |
a Primers were built checking for conserved regions (if present) among the reference sequence and the different strains identified by using the refinement procedure adopted in the read mapping method. The following strains were used: PPV2—GenBank accession numbers: KP245947, GU938300, GU938301, KP765690, KC701309 and JX101461; PPV4—GenBank accession numbers: GQ387499 and GQ387500; PPV5—GenBank accession umbers: JX896319, JX896320 and JX896321; PPV6—GenBank accession: KF999682, KF999683, KF999684 and KF999685. Primers PBoV1-H18_317-616nt and PBoV1-H18_1-316nt_317-616nt were built based on the assembled sequence obtained by VirFind and on the reference sequence HQ291308. Primers PCV2_flank and PCV2_ovlp were built based on the assembled sequence obtained by VirFind and on the reference sequences KM259933 (truncated genome) and AY424401 (full genome adopted as reference).
b Annealing temperature (°C) / [MgCl2].
Summary of the Ion Torrent reads utilized for the detection of viral genomes.
The number of reads are reported for the LibP and LibN DNA pools.
| Information | LibP | LibN |
|---|---|---|
| Sequenced reads | 3,581,496 | 3,887,066 |
| Reads after preprocessing | 3,390,796 | 3,731,776 |
| Pig–Unmapped | 936,056 | 1,097,061 |
| Virus | 9,926 | 11,752 |
| Virus–no duplicates | 1,879 | 1,741 |
a “Pig–Unmapped” refers to reads unmapped on the S. scrofa nuclear genome; “Virus” refers to reads unmapped on the S. scrofa reference genome and mapping on viral genomes; “Virus–no duplicates” is the same of “Virus”, but after removing PCR duplicates.
Fig 2PCR validation of the in silico detected porcine viruses.
Boxes are named after the primer pairs utilized for the validation reported in Table 1. a) PPV4; b) PPV2; c) PPV5; d) PBoV1-H18_1-316nt_317-616nt; e) PBoV1-H18_317-616nt and f) PPV6. Each box presents the following columns: “M”—ladders molecular size markers; “LibP”—amplification products (in duplicate) in the LibP DNA pool; “LibN”—amplification products (in duplicate) in the LibN DNA pool.
Summary of the viruses identified in the two next generation sequencing datasets (LibP and LibN).
Identification was obtained by in silico analyses (with the read mapping and sequence assembly approaches) and on DNA samples from which libraries were generated by PCR analyses on DNA pools and on individual DNA samples.
| Virus/Primer pair | DNA pools ( | DNA pools (PCR) | Individual DNA samples (PCR) | |||
|---|---|---|---|---|---|---|
| LibP | LibN | LibP | LibN | LibP | LibN | |
| PPV2 | +/+ | +/- | + | - | 5 | 0 |
| PPV4 | +/- | +/+ | + | + | 7 | 5 |
| PPV5 | +/+ | +/+ | + | + | 4 | 3 |
| PPV6 | +/+ | +/+ | + | + | 6 | 7 |
| PBoV1-H18_1-316nt_317-616nt | -/+ | -/- | + | - | 1 | NA |
| PBoV1-H18_317-616nt | -/+ | -/- | + | - | 1 | NA |
a Identification of viral sequences by in silico analyses: rm = read mapping approach; sa = sequence assembly approach. “+” indicates presence of sequences; “-” indicates absence of sequences.
b Identification of the presence of viral sequences by PCR analysis on DNA pools: “+” indicates presence of amplification (positive); “-” indicates absence of amplification (negative).
c Identification of the presence of viral sequences by PCR analysis on individual DNA samples. The number of positive samples is reported out of 50 pigs for the two groups (LibP and LibN).
* Not amplified.