Literature DB >> 28424887

A metagenomics study for the identification of respiratory viruses in mixed clinical specimens: an application of the iterative mapping approach.

Yu-Nong Gong1,2, Shu-Li Yang1,3, Guang-Wu Chen1,2,4, Yu-Wen Chen5, Yhu-Chering Huang5,6, Hsiao-Chen Ning1,3, Kuo-Chien Tsao7,8,9.   

Abstract

Metagenomic approaches to detect viral genomes and variants in clinical samples have various challenges, including low viral titers and bacterial and human genome contamination. To address these limitations, we examined a next-generation sequencing (NGS) and iterative mapping approach for virus detection in clinical samples. We analyzed 40 clinical specimens from hospitalized children diagnosed with acute bronchiolitis, croup, or respiratory tract infections in which virus identification by viral culture or polymerase chain reaction (PCR) was unsuccessful. For our NGS data analysis pipeline, clinical samples were pooled into two NGS groups to reduce sequencing costs, and the depth and coverage of assembled contigs were effectively increased using an iterative mapping approach. PCR was individually performed for each specimen according to the NGS-predicted viral type. We successfully detected previously unidentified respiratory viruses in 26 of 40 specimens using our proposed NGS pipeline. Two dominant populations within the detected viruses were human rhinoviruses (HRVs; n = 14) and human coronavirus NL63 (n = 8), followed by human parainfluenza virus (HPIV), human parechovirus, influenza A virus, respiratory syncytial virus (RSV), and human metapneumovirus. This is the first study reporting the complete genome sequences of HRV-A101, HRV-C3, HPIV-4a, and RSV, as well as an analysis of their genetic variants, in Taiwan. These results demonstrate that this NGS pipeline allows to detect viruses which were not identified by routine diagnostic assays, directly from clinical samples.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 28424887      PMCID: PMC7087367          DOI: 10.1007/s00705-017-3367-4

Source DB:  PubMed          Journal:  Arch Virol        ISSN: 0304-8608            Impact factor:   2.574


Introduction

Virus identification using traditional polymerase chain reaction (PCR)-based methods is challenging [1, 2], particularly when viral loads are low. Genetic diversity of viruses could also lead to mismatches between probes and primer sequences, resulting in incorrect PCR results [3]. To address these challenges, unbiased next-generation sequencing (NGS) techniques have been developed to improve viral discovery. These methods have been shown to be effective for the identification and genomic characterization of influenza A viruses (IAVs) [4], hepatitis C virus [5], and other respiratory viruses [6]. Additionally, not all viruses can be cultured using common cell lines, e.g. human rhinovirus (HRV) type C [7], human bocavirus (HBoV) [8], and the human coronaviruses (HCoV) NL63 and HKU1 [3, 9]. Therefore, NGS techniques allow to successfully detect viruses which can be difficult to culture and may lead to erroneous PCR results. In metagenomics, modern genomic techniques are applied to characterize communities of microbial organisms directly from their natural environments, without the isolation and cultivation of individual species [10]. NGS technology has been applied to metagenomes to detect the presence of viral pathogens from single non-cultured specimens [11], including influenza virus identification and whole-genome sequencing from swab specimens [12-14] and respiratory virus identification from nasopharyngeal aspirate specimens [15]. However, the direct recovery of viral genomes from clinical specimens using NGS methods has challenges, including noise from host or microbiota cells and the limited viral RNA quantities [16]. To separate viruses from host or background flora, a novel enrichment technique called NetoVir has been developed for high-throughput sample preparation [17]. Notably, viral etiologies were unknown in ~25% of 120 clinical specimens collected from children with bronchiolitis [18], ~45% of 336 hospitalized children (<5 years old) with lower respiratory tract infections (RTIs) [19], and ~60% of 2259 clinical specimens from community acquired pneumonia patients with undetected viral pathogens [20]. Although NGS pipelines [21, 22] have been proposed to identify previously unidentified viruses from clinical specimens, sequencing costs remain an issue. In our previous study, to reduce sequencing costs, we applied NGS to pooled samples [6]. An NGS data analysis pipeline was proposed for the detection of unidentified viruses, including human parechovirus (HPeV), using an iterative mapping approach to obtain genome sequences. In this study, clinical specimens suspected of harboring viruses were collected from hospitalized patients in Taiwan. Our NGS data analysis method was applied to identify unknown viral pathogens directly from these clinical specimens and to obtain their genome sequences by iterative mapping. Furthermore, we explored genetic variation within the samples to clarify the molecular evolution of the observed Taiwanese strains.

Materials and methods

Ethics statement

This study was approved by the Institutional Review Board of Chang Gung Medical Foundation, Linkou Medical Center, Taoyuan, Taiwan (approval no. 100-4378B).

Specimen collection, pretreatment, and RNA extraction

Children (592) hospitalized between 2008 and 2010 with respiratory symptoms were recruited from Chang Gung Memorial Hospital, Taiwan. Among the patients, 105, 124, and 363 were diagnosed with croup, acute bronchiolitis, and RTIs, respectively. Specimens from these patients were examined by routine culture for further viral/bacterial identification and by real-time PCR for viral identification. Negative results were obtained for 20 croup, 29 acute bronchiolitis, and 138 RTIs cases according to routine culture and real-time PCR. Thus, these patients were suspected to have infections caused by unknown/untested viruses. Forty specimens were randomly collected from these samples for further NGS analyses, including ten, nine, and 21 samples from patients diagnosed with croup, acute bronchiolitis, and RTIs, respectively. Among these, 34, five, and one clinical specimens were collected via throat swabs, nasopharyngeal swabs, and sputum. Specimens (0.5 mL each) were divided into two groups; for each group, 20 samples were pooled in a single tube, and the mixed specimens were filtered through 0.22-μm filters to improve analytical sensitivity and reduce background contamination prior to subsequent analyses. For viral particle enrichment, filtered specimens underwent ultracentrifugation using a SW41/Ti rotor at 28,800 × g, 4 °C overnight. Viral RNA extraction from 140 μL of the specimens was performed using the QIAamp Viral RNA Mini Kit (Qiagen, Hilden, Germany). To extract nucleic acids, carrier RNA was replaced using linear acrylamide (Life Technologies, Carlsbad, CA, USA) as a precipitation reagent, included in the QIAamp Viral RNA Mini Kit, according to previously published methods [11]. Extracted RNA was eluted using 30 μL of elution buffer and stored at −70 °C.

NGS data analysis pipeline

Forty specimens were pooled into two NGS groups and cDNA as well as libraries of mixed samples were synthesized using Ovation RNA-Seq System V2 and Ovation Ultralow Library Systems (NuGEN, San Carlos, CA, USA), according to previously published methods [11]. The Illumina MiSeq system was used to obtain paired-end reads (2 × 250 bp). Approximately 4 GB of raw data were generated for each of NGS 1 and 2 (an average throughput of 0.2 GB for each sample), providing 5,996,746 and 8,140,721 paired-end reads, respectively. All of raw reads were deposited to the Sequence Read Archive (SRA) of NCBI with accession number SRP100814. The NGS data analysis pipeline [6] was used to identify unknown pathogens, assemble whole genomes, and investigate viral diversity from mixed clinical samples. Briefly, the pipeline consisted of the following steps: 1) the Illumina system was used to collect data from clinical samples; 2) data were preprocessed to generate NGS reads and determine homologs for the assembled contigs using BLASTN (default setting with an E-value threshold of 10 and word size of 20) and then BLASTX (default settings with an E-value threshold of 10); 3) PCR validation was performed; 4) iterative mapping was performed to obtain viral genomes; 5) reference mapping was performed to identify genetic variants. Six genomes obtained in this study were deposited in GenBank, including HPeV-1 (accession number KY460513), human parainfluenza virus 4a (HPIV-4a, KY460514), HRV-A101 (KY460515), and HRV-C3 (KY460516) detected in NGS 1, and HPIV-4a (KY460517) and RSV (KY460519) detected in NGS 2. To investigate the read depth and detect viral genomic variants, reference mapping was performed with genomic templates using Bowtie2 version 2.2.5 [23] with default settings. Position-specific read counts and nucleotide compositions obtained after read mapping were examined. A genetic variant was identified when mutant nucleotides were greater than 25% of the total reads and read depth was at least 20 [24, 25]. In addition, viral genomes of different genotypes from the same species usually share high sequence homology. Consequently, one NGS read might redundantly map to different templates if each of them was mapped independently. To avoid multiple mappings of single reads in pooled samples, individual templates from different viral genotypes (e.g. three HRV genomes G1, G2, and G3) were concatenated into a longer template (G1 + G2 + G3) which served as an initial template in our iterative mapping approach.

PCR validation of clinical specimens

Each of the 40 clinical specimens was tested by PCR to confirm the results of the NGS data analysis. Virus-specific primers and probes for enterovirus (EV), HCoV-229E, -NL63 and -OC43, human metapneumovirus type 2 (hMPV-2), HPIV-3 and -4, HPeV, HRV, IAV, RSV, and rotavirus (Rota) were used for viral RNA amplification and detection, as listed in Supplementary File 1. HRV and enterovirus (EV) were genotyped using BLASTN searches against the NCBI nucleotide database.

Results

Detection of potential viral types in metagenomes

Forty clinical specimens were collected and pooled into two NGS groups; 21,333 and 39,458 contigs (defined as a set of overlapping reads, representing a consensus sequence of a partial or whole genome) were assembled from NGS experiments 1 and 2, respectively. These contigs were primarily catalogued as viruses, bacteria, and eukaryotes based on BLASTN results. Viral contigs identified by BLASTN are summarized in Supplementary File 2, and Figure 1 shows the reads classification according to the BLASTN results for the assembled contigs. The dominant populations in NGS 1 and 2 were viral (86%) and bacterial (69%) reads, respectively. Table 1 shows 16 potential viral types identified by BLASTN, including: HCoV-NL63, HPeV-1, HPIV-4a, HRV-A101, -C3, -C4, and -C40, and IAV from NGS 1, as well as HCoV-NL63, HPeV-1, HRV-B92, -C26, and -C40, HPIV-4a, IAV, and RSV from NGS 2. We then calculated the average depths and genome coverages (%) for the mapped contigs. As shown in Table 1, in NGS 1, the genomic coverage ranged from 48.4% (from HPIV-4a) to 100% [HCoV-NL63, HRV-C40, and the polymerase acidic (PA), neuraminidase (NA), and matrix protein (MP) genes of IAV] and the contig depths ranged from 6.4 (the NA gene of IAV) to 690.0 (HRV-C40). In NGS 2, the genomic coverage ranged from 4.0% (HPeV-1) to 64.3% (HRV-C40) and the contig depth ranged from 2.2 (HPeV-1) to 167.0 (HPIV-4a). Contig coverages presented in Table 1 were estimated as the covered region of contig(s) divided by the length of the coding sequence. Contig depths were calculated as the average number of times each base in the contig was sequenced from different reads. Only 255 and 552 contigs in NGS 1 and 2, respectively, did not show any matches with database sequences; these were further examined using BLASTX searches. Among them, 153 NGS 1 and 380 NGS 2 contigs matched to sequences in the database, and only two NGS 1 and four NGS 2 contigs belonged to the virus group (unclassified RNA viruses or bacteriophages, but not respiratory viruses), as shown in Supplementary File 2.
Fig. 1

Read classification by BLASTN searches

Table 1

Genomic depths and coverages of viral contigs obtained by NGS using an iterative mapping approach. Sixteen viruses were identified in two NGS pools using the proposed NGS pipeline. Genomic depths of assembled contigs for each virus type were averaged. Genomic coverages were estimated as the covered regions of contig(s) divided by the length of the coding sequence (CDS). Covered regions of assembled contigs were defined as the sum of aligned regions of contigs in BLASTN searches. The CDS lengths for HCoV-NL63, HPIV-4a, and RSV were approximately 27,000, 14,450, and 13,600 bp, respectively, and for HRV and HPeV they were approximately 6450 bp. The CDS lengths for the PB2, PB1, PA, HA, NP, NA, MP, and NS segments were approximately 2280, 2274, 2151, 1701, 1497, 692, 409, and 614 bp, respectively. In iterative mapping, genomics depths were examined by the average of position-specific read counts, and genomic coverages were defined as the sum of aligned regions of read mapping

NGS pool De novo assemblyIterative mapping
NGS-predicted candidateContig countGenomic coverage (%)Average depthIterationsGenomic coverage (%)Average depth
1HCoV-NL63123100570.011007001.1
HPeV-1397.0376.93100644.0
HPIV-4a1748.417.1699.865.7
HRV-A101595.046.2310081.1
HRV-C3396.0174.0399.9275.2
HRV-C4956.310.3395.216.9
HRV-C401100690.03100743.7
IAV*1966.3, 69.1, 100, 99.5, 87.3, 100, 100, and 85.85.8, 21.1, 13.6, 11.6, 16.4, 6.4, 6.5, and 13.7495.1, 77.5, 100, 98.9, 88.2, 84.7, 100, and 86.411.4, 19.3, 18.3, 18.5, 27.5, 7.1, 8.3, and 10.0
2HCoV-NL635922.0149.01100773.8
HPeV-114.02.2219.41.6
HPIV-4a2244.8167.02100341.6
HRV-B92464.323.5583.142.0
HRV-C6543.029.5542.134.6
HRV-C40644.328.5585.038.1
IAV117.4 (PA)6.1 (PA)13.5 (PB1), 37.2 (PA)1.0 (PB1), 4.0 (PA)
RSV3333.645.66100180.2

* For IAV, eight depth or coverage estimates for the PB2, PB1, PA, HA, NP, NA, MP, and NS genes are presented in order

Read classification by BLASTN searches Genomic depths and coverages of viral contigs obtained by NGS using an iterative mapping approach. Sixteen viruses were identified in two NGS pools using the proposed NGS pipeline. Genomic depths of assembled contigs for each virus type were averaged. Genomic coverages were estimated as the covered regions of contig(s) divided by the length of the coding sequence (CDS). Covered regions of assembled contigs were defined as the sum of aligned regions of contigs in BLASTN searches. The CDS lengths for HCoV-NL63, HPIV-4a, and RSV were approximately 27,000, 14,450, and 13,600 bp, respectively, and for HRV and HPeV they were approximately 6450 bp. The CDS lengths for the PB2, PB1, PA, HA, NP, NA, MP, and NS segments were approximately 2280, 2274, 2151, 1701, 1497, 692, 409, and 614 bp, respectively. In iterative mapping, genomics depths were examined by the average of position-specific read counts, and genomic coverages were defined as the sum of aligned regions of read mapping * For IAV, eight depth or coverage estimates for the PB2, PB1, PA, HA, NP, NA, MP, and NS genes are presented in order

Increasing genomic depths and coverages via iterative mapping

Although a few long contigs might completely or nearly span the full-length genome, many short contigs were mapped only partially to detected viruses (one HRV-C40 and nine HRV-C4 contigs encompassed 100% and 56.3% of the genomes, respectively). An iterative mapping approach was used to include as many NGS reads as possible for reference mapping. Briefly, a consensus sequence was generated from one or more short contigs and reference genomes. The reference genome was used to assemble the missing regions in order to generate an initial template because the short contigs might only cover the partial viral genome. Subsequently, the iterative process was initiated based on this combined genome. Table 1 compares the average depth and coverage using the iterative mapping approach (for various iterations) and de novo assembly. Following three iterations, the genomic coverage of the aforementioned HRV-C4 genome increased from 56.3% to 95.2%, with genomic depths from 10.3 to 16.9. The coverages of detected viruses in NGS 1 included nearly complete genomes, except for HRV-C4, with 95.1% coverage, and each of the eight IAV genes, with >77.0% coverage. Furthermore, the contig depths of viral genomes generated by iterative mapping were generally greater than the original contig depths. These results demonstrated that, by using the iterative mapping process, the depths and genomic content of the terminal templates in NGS 1 and 2 were increased compared with those of their respective initial contigs. Read-coverage distributions along the reported viral genomes in NGS 1 (Fig. 2A–2O) and 2 (Fig. 2P–2W) are shown in Figure 2. NGS 1 showed depth distributions with >77.0% coverage and average depths from 7.1 to 7001.1. NGS 2 showed depth distributions with >83.0% coverage and average depths from 38.1 to 773.8, except for HPeV-1, HRV-C6, and IAV.
Fig. 2

Read distributions of viral genomes. Read distributions of the viral genomes revealed by read mapping: (A) HCoV-NL63, (B) HPeV-1, (C) HPIV-4a, (D-K) from PB2 to NS gene of IAV, (L) HRV-A101, (M) HRV-C3, (N) HRV-C4, and (O) HRV-C40 in NGS 1; (P) HCoV-NL63, (Q) HPeV-1, (R) HPIV-4a, (S) RSV, (T) HRV-B92, (U) HRV-C6, (V) HRV-C40, and (W) PA gene of IAV in NGS 2. Read distributions in NGS 1 showed >77.0% coverage and average depths from 7.1 to 7001.1, and in NGS 2 showed >83.0% coverage and average depths from 38.1 to 773.8, except for HPeV-1, HRV-C6, and IAV. The read distribution of IAV PB1 gene in NGS 2 was not shown, due to only one read mapping to this gene

Read distributions of viral genomes. Read distributions of the viral genomes revealed by read mapping: (A) HCoV-NL63, (B) HPeV-1, (C) HPIV-4a, (D-K) from PB2 to NS gene of IAV, (L) HRV-A101, (M) HRV-C3, (N) HRV-C4, and (O) HRV-C40 in NGS 1; (P) HCoV-NL63, (Q) HPeV-1, (R) HPIV-4a, (S) RSV, (T) HRV-B92, (U) HRV-C6, (V) HRV-C40, and (W) PA gene of IAV in NGS 2. Read distributions in NGS 1 showed >77.0% coverage and average depths from 7.1 to 7001.1, and in NGS 2 showed >83.0% coverage and average depths from 38.1 to 773.8, except for HPeV-1, HRV-C6, and IAV. The read distribution of IAV PB1 gene in NGS 2 was not shown, due to only one read mapping to this gene

PCR results for each clinical specimen

Following the NGS analysis of mixed samples, virus-specific molecular diagnoses for each of the 40 specimens were determined using the predicted viruses shown in Table 1. PCR was performed to validate the presence of viruses, including EV, HCoV-229E, -NL63, and -OC43, hMPV-2, HPIV-3 and -4, HPeV, HRV, IAV, RSV, and Rota, using specific PCR primers (Table 2 and Supplementary File 1). PCR results agreed with NGS predictions for NGS 1; however, for sample IDs 40, 22, and 35, PCR analysis detected hMPV, HRV-C3, and HRV-C (without a specific type), respectively, but these could not be identified by NGS. By contrast, HPeV-1 was detected by NGS, with a genomic depth and coverage of 1.6 and 19.4, respectively, but was not identified by PCR.
Table 2

PCR confirmation of the 40 clinical samples. Virus-specific PCR assays for each of the 40 isolates were performed to confirm NGS-predicted viruses and to validate their presence. PCR results agreed with NGS predictions in NGS 1; however, hMPV, HRV-C3, and an unclassified type of HRV-C detected by PCR and HPeV-1 detected by NGS were inconsistent in NGS 2 and are marked in bold

NGS pool: predictionSample IDPCR-validated candidates
1: HCoV-NL63, HPeV-1, HPIV-4a, HRV-A101, -C3, -C4, and -C40, and IAV1IAV
2HRV-C3
3HCoV-NL63
4HRV-C4
5HCoV-NL63
6HRV-A101
7HPIV-4a
8
9HRV-C40
10HCoV-NL63, HPeV-1
11HRV-C40
12
13HCoV-NL63
14
15HCoV-NL63
16
17
18HRV-C40
19
20HCoV-NL63
2: HCoV-NL63, HPeV-1, HPIV-4a, HRV-B92, -C26, and -C40, IAV, and RSV21HRV-C40
22 HRV-C3
23HRV-C6
24
25HCoV-NL63
26RSV
27HRV-B92
28
29IAV, HPIV-4a
30
31
32
33HRV-C40
34HRV-C6
35 HRV-C (without a specific type)
36HRV-B92
37
38
39
40HCoV-NL63, hMPV
PCR confirmation of the 40 clinical samples. Virus-specific PCR assays for each of the 40 isolates were performed to confirm NGS-predicted viruses and to validate their presence. PCR results agreed with NGS predictions in NGS 1; however, hMPV, HRV-C3, and an unclassified type of HRV-C detected by PCR and HPeV-1 detected by NGS were inconsistent in NGS 2 and are marked in bold In terms of clinical symptoms, the 40 specimens used in this study were grouped into ten croup, nine acute bronchiolitis, and 21 RTIs cases. The viruses detected in the hospitalized children with acute bronchiolitis included four HRVs of type A101, C3, C4, and C40, followed by two each of HCoV-NL63 and IAV and one each of HPIV-4a and hMPV. The viruses detected in the hospitalized children with acute bronchiolitis included six HCoV-NL63, followed by one each of HRV-B92, HRV-C40, and HPeV-1. The viruses detected in the hospitalized children with RTIs included seven HRV-Cs, followed by one each of HRV-B92, HPIV-4a, and RSV. Notably, the negative detection rate of 52.4% (11 of 21) for hospitalized children with RTIs was much greater than the 11% and 20% rates observed for acute bronchiolitis and croup cases, respectively.

Read depths and genetic variants of viral genomes

A limitation of this study was the inability to distinguish NGS reads from different samples with the same viral type; however, HRV-C3, HRV-A101, HPIV-4a, and HPeV-1 in NGS 1 and RSV and HPIV-4a in NGS 2 showed single viral types with complete coverages in their mixed samples. Read mapping was performed to assess the overall depth distribution along the genome and to identify genetic variants exhibiting heterogeneous nucleotide compositions. A genetic variant was defined as including nucleotide positions where the major nucleotide appeared in <75% of the mapped reads with read depths of at least 20. For the aforementioned viruses, ten, one, three, and five nonsynonymous mutations for HRV-C3, HRV-A101, HPIV-4a, and HPeV-1 of NGS 1, respectively, and 17 and three for RSV and HPIV-4a of NGS 2, respectively (Table 3), were detected. These six genomes were generated by iterative mapping. In addition to genetic variants, we observed two insertions located at positions 904–906 (codon ATA) and 988–990 (codon CCA) in the VP2 gene in our Taiwanese HRV-A101 strain. Compared with three genomes (accession numbers GQ415051, JQ245965, and GQ415052) downloaded from GenBank in October 2016, the first insertion was absent in GQ415051 and JQ245965, and the second insertion was absent in GQ415052.
Table 3

Genetic variants in heterogeneous populations identified by read mapping

NGS poolSample IDViral typeGeneNonsynonymous mutationVariant nucleotide position (total depth: nucleotide composition/frequency)
12HRV-C3VP2L307I919 (135: T/100, A/27, C/7, G/1)
VP1D622Y1864 (79: G/58, T/15, C/5, A/1)
N623K1869 (69: C/50, A/19)
2CQ1486K4456 (234: C/164, A/46, T/19, G/5)
P1487Q/T/K4459 (194: C/120, A/59, T/15); 4460 (186: C/131, A/49, T/5 G/1)
V1489I4465 (133: G/93, A/26, C/7, T/7)
3DI1976N/K/L/H/Q5926 (372: A/240, C/90, T/29, G/13); 5927 (328: T/201, A/68, C/52, G/7); 5928 (271: T/131, A/63, C/48, G/29)
F1977I5929 (218: T/129 A/58 C/26 G/5)
I1980N/K5939 (82: T/58, A/19, G/5); 5940 (80: T/56, A/14, G/7, C/3)
I1981L5941 (76: A/55, T/10, G/9, C/2)
6HRV-A101VP4T15K44 (90: C/66, A/24)
7HPIV-4aMC201R604 (88: T/44, A/40, G/4)
L202M609 (117: A/81, G/21, T/15)
LR3595G754 (36: A/24, G/12)
10HPeV-1VP1N589K1767 (1054: T/766, A/161, G/103, C/24)
2AK918N2754 (21: A/15, C/3, G/2, T/1)
G919V2756 (21: G/15, T/4, A/2)
3DY1770N5308 (302: T/221, A/53, C/28)
F2126L6376 (430: T/320, C/75, A/34, G/1)
226RSVNS1S99P295 (740: T/530, C/124, A/76, G/10)
NS2D2N4 (75: G/53, A/21, C/1)
NM257L769 (94: A/70, T/14, G/6, C/4)
L258F774 (106: A/75, T/22, G/6, C/3)
E388D1164 (21: G/10, T/10, C/1)
FR49G145 (50: A/36, G/7, T/6, C/1)
T50S148 (66: A/46, T/15, G/5)
M2I173T518 (71: T/50, C/16, A/5); 519 (68: C/49, T/15, A/4)
N174H520 (62: A/45, C/15, T/2)
LL196S587 (30: T/16, C/14)
Q198R593 (39: A/26, G/9, T/4)
Y724S2171 (26: A /17, C/7, T/2)
S767G2299 (52: A/34, G/13, T/3, C/2)
S910T2729 (54: G/38, C/15, T/1)
N969D2905 (35: A/20, G/15)
N991D2971 (57: A/42, G/8, C/7)
R1256G3766 (52: A/29, G/22, T/1)
29HPIV-4aNPR216S648 (25: A/18, C/5, G/2)
A268E803 (128: C/92, A/32, T/3, G/1)
P345L1034 (1801: C/1265, T/451, A/59, G/26)
Genetic variants in heterogeneous populations identified by read mapping

Discussion

More than 25% of clinical samples harbour undetected viral pathogens [18-20]; therefore, new diagnostic tests, such as NGS-based analyses, are required for viral identification. Here, we collected 40 clinical samples for which viruses were not detected by routine viral culture or PCR methods. After applying an NGS data analysis pipeline, we successfully identified viral pathogens from 26 clinical samples (including three co-infections in sample IDs ten, 29, and 40) in two NGS groups. Detected viruses included HCoV-NL63, HPeV-1, HPIV-4a, HRV-A101, -C3, -C4, and -C40, and IAV from NGS 1 and HCoV-NL63, hMPV, HPeV-1, HPIV-4a, HRV-B92, -C26, and -C40, IAV, and RSV from NGS 2. The two dominant viral populations were HRV (n = 13) and HCoV-NL63 (n = 8). Reads of assembled contigs were categorized as viruses, bacteria, and eukaryotes (Fig. 1). The two pooled samples exhibited different taxonomic compositions; for example, 86% of total reads belonged to the virus group in NGS 1, but only 6% in NGS 2. This difference might be explained by differences in viral copies in mixed samples. Nevertheless, a detection rate of 60% in NGS 2 was obtained using our NGS data analysis pipeline, approaching 70% in NGS 1. In addition to low viral copies, challenges to the recovery of viral genomes from clinical samples using NGS platforms include bacterial and human genome contamination. In this study, we applied an NGS data analysis pipeline using iterative mapping to improve the recovery of virus-derived reads in mixed clinical samples, in order to identify their genomic sequences. Genomic depths and coverages in two NGS samples were effectively increased using this process; however, these increases were not evident for viral isolates [6]. A decrease in contamination with human and bacterial genomes increases the depth and coverage of viral contigs from viral isolates. In this study, viral genomes were identified following ≤6 iterations (as reference mappings) using this approach. NGS has various advantages over PCR-based methods. Using this technology, whole genomes can be obtained for viral detection, genotyping, and diversity analyses without designing primers or prior knowledge of viral presence [1, 6]. Correlations between NGS reads and PCR cycle threshold values have been observed, suggesting that NGS reads are suitable indicators of viral copy number [2, 26]. In this study, NGS applied to pooled samples offered advantages, including reduced sequencing costs and the ability to screen multiple epidemiological samples for potential outbreaks. However, the use of mixed samples had some limitations, including the inability to assign viral candidates to each clinical sample or to calculate the association between NGS reads and Ct values when multiple viruses of the same type were pooled in a single NGS group. In this study, 40 specimens were collected from 10, nine, and 21 hospitalized children diagnosed with croup, acute bronchiolitis, and RTIs, respectively. The dominant viral population in the acute bronchiolitis cases was HRV (including types A101, C3, C4, and C40), which is inconsistent with previous studies indicating that RSV is the dominant population in patients with bronchiolitis [18, 27]. HRV type C was first reported in 2007 and could not be grown by standard cell culture methods [28]. A systematic approach has been proposed to culture the virus in differentiated epithelial cells of the human airway at the air–liquid interface [29]; however, this method is not suitable for routine viral diagnosis in clinical laboratories. Accordingly, the prevalence of HRV-C in acute bronchiolitis cases has likely been underestimated. HCoV-NL63, a global respiratory tract pathogen, is a primary cause of respiratory disease in children admitted to hospital in Taiwan, specifically in patients diagnosed with croup [30]. This agrees with our finding showing that six out of ten croup cases were diagnosed as NL63 infections. One study demonstrated that newly discovered viruses might fail to be amplified by RT-PCR assays designed for known viruses, due to primer sequence mismatches [3]. Therefore, NGS analysis, which does not require specific primers and probes, is superior to these diagnostic assays for the detection of viruses harboring novel mutations in PCR primer binding sites. Viruses detected in patients diagnosed with RTIs included seven HRV type C, and one each of HRV type B, RSV, and HPIV-4a. The negative detection rate for patients with RTIs was 52.4% (11 of 21), which was greater than that for acute bronchiolitis cases (11.1%; one of nine) and croup cases (20%; two of ten). Further investigations are needed to identify currently unknown human viruses in patients diagnosed with RTIs. HRVs, including 11 type C, two B, and one A, were the dominant viruses detected in this study. HRV-C causes more severe illnesses than HRV-A and -B [31, 32]; however, limited HRV-C genomes have been published due to the lack of availability of culture systems in clinical laboratory settings [33]. More detailed viral typing information, for viruses such as HRV, HEV, hMPV, and RSV, can be obtained by NGS than by PCR-based methods [26]. Our results demonstrated that NGS effectively characterizes respiratory viral infections, specifically those involving HRV; however, two HRVs (one C3 and the other un-typed) and one hMPV were not identified by NGS in NGS 2 of this study. It was difficult to discriminate between all HRV genotypes in this mixed NGS sample because NGS 2 contained eight HRVs that shared similar genetic segments. Additionally, hMPV might exhibit a low viral load in sample ID 40 co-infected with HCoV-NL63. In other words, most of the reads in NGS 2 were derived from viruses with high titers. However, one HPeV-1 in NGS 2 was not identified by PCR. This HPeV exhibited a genomic coverage of 19.4% in our NGS analysis. Further investigations are needed to explain this low coverage (e.g. low viral titers or bleed-through contamination from the HPeV-1 in NGS 1 [34]). In this metagenomics study, an NGS data analysis pipeline was used to identify viral pathogens from 40 hospitalized patients. No viruses were detected in these clinical samples using routine culture and PCR methods; however, our NGS pipeline with PCR confirmation resulted in a detection rate of 65% (26 of 40 samples). With respect to nucleotide diversity in the detected viruses, 39 polymorphic genetic variants were detected from six whole genomes using mapped NGS reads. Further analyses are needed to determine the biological consequences of this genetic variation. Our results indicated that the NGS method was able to detect viruses that cannot be directly identified by routine culture or PCR methods targeting divergent or other viruses. Furthermore, this NGS method could be used to reveal genetic sequences and detect diversity from mixed samples using an iterative mapping approach. Our results provide the first complete genomes for HRV-A101, HRV-C3, HPIV-4a, and RSV in Taiwan. In future work, we will apply this method to the rapid identification of emerging viruses directly from clinical samples during potential outbreaks as well as to the detection of genetic variants, to further explore the associations between viral genotypes and disease severity. Below is the link to the electronic supplementary material. Supplementary material 1 (DOC 77 kb) Supplementary material 2 (XLSX 78 kb)
  9 in total

1.  Nasopharyngeal metagenomic deep sequencing data, Lancaster, UK, 2014-2015.

Authors:  Kate V Atkinson; Lisa A Bishop; Glenn Rhodes; Nicolas Salez; Neil R McEwan; Matthew J Hegarty; Julie Robey; Nicola Harding; Simon Wetherell; Robert M Lauder; Roger W Pickup; Mark Wilkinson; Derek Gatherer
Journal:  Sci Data       Date:  2017-10-24       Impact factor: 6.444

Review 2.  Highlighting Clinical Metagenomics for Enhanced Diagnostic Decision-making: A Step Towards Wider Implementation.

Authors:  Jessica D Forbes; Natalie C Knox; Christy-Lynn Peterson; Aleisha R Reimer
Journal:  Comput Struct Biotechnol J       Date:  2018-02-27       Impact factor: 7.271

3.  Viral infection detection using metagenomics technology in six poultry farms of eastern China.

Authors:  Yuan Qiu; Suchun Wang; Baoxu Huang; Huanxiang Zhong; Zihao Pan; Qingye Zhuang; Cheng Peng; Guangyu Hou; Kaicheng Wang
Journal:  PLoS One       Date:  2019-02-20       Impact factor: 3.240

4.  Seroprevalence of anti-SARS-CoV-2 antibodies in Japanese COVID-19 patients.

Authors:  Makoto Hiki; Yoko Tabe; Tomohiko Ai; Yuya Matsue; Norihiro Harada; Kiichi Sugimoto; Yasushi Matsushita; Masakazu Matsushita; Mitsuru Wakita; Shigeki Misawa; Mayumi Idei; Takashi Miida; Naoto Tamura; Kazuhisa Takahashi; Toshio Naito
Journal:  PLoS One       Date:  2021-04-06       Impact factor: 3.240

5.  A Comparison of Bioinformatics Pipelines for Enrichment Illumina Next Generation Sequencing Systems in Detecting SARS-CoV-2 Virus Strains.

Authors:  Stefanus Bernard; Hendra Wibawa; Mohamad Saifudin Hakim; Arli Aditya Parikesit; Chandra Kusuma Dewa; Yasubumi Sakakibara
Journal:  Genes (Basel)       Date:  2022-07-26       Impact factor: 4.141

6.  Viral Metagenomics for the Identification of Emerging Infections in Clinical Samples with Inconclusive Dengue, Zika, and Chikungunya Viral Amplification.

Authors:  Juliana Vanessa Cavalcante Souza; Hazerral de Oliveira Santos; Anderson Brandão Leite; Marta Giovanetti; Rafael Dos Santos Bezerra; Eneas de Carvalho; Jardelina de Souza Todão Bernardino; Vincent Louis Viala; Rodrigo Haddad; Massimo Ciccozzi; Luiz Carlos Junior Alcantara; Sandra Coccuzzo Sampaio; Dimas Tadeu Covas; Simone Kashima; Maria Carolina Elias; Svetoslav Nanev Slavov
Journal:  Viruses       Date:  2022-08-31       Impact factor: 5.818

Review 7.  Detection of respiratory viruses directly from clinical samples using next-generation sequencing: A literature review of recent advances and potential for routine clinical use.

Authors:  Xinye Wang; Sacha Stelzer-Braid; Matthew Scotch; William D Rawlinson
Journal:  Rev Med Virol       Date:  2022-07-01       Impact factor: 11.043

8.  Metagenomic analysis of viral nucleic acid extraction methods in respiratory clinical samples.

Authors:  Dan Zhang; Xiuyu Lou; Hao Yan; Junhang Pan; Haiyan Mao; Hongfeng Tang; Yan Shu; Yun Zhao; Lei Liu; Junping Li; Jiang Chen; Yanjun Zhang; Xuejun Ma
Journal:  BMC Genomics       Date:  2018-10-25       Impact factor: 3.969

Review 9.  The genetic sequence, origin, and diagnosis of SARS-CoV-2.

Authors:  Huihui Wang; Xuemei Li; Tao Li; Shubing Zhang; Lianzi Wang; Xian Wu; Jiaqing Liu
Journal:  Eur J Clin Microbiol Infect Dis       Date:  2020-04-24       Impact factor: 3.267

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.