| Literature DB >> 27624998 |
Hong Kai Lee1, Chun Kiat Lee1, Julian Wei-Tze Tang2,3, Tze Ping Loh1, Evelyn Siew-Chuan Koay1,4.
Abstract
Accurate full-length genomic sequences are important for viral phylogenetic studies. We developed a targeted high-throughput whole genome sequencing (HT-WGS) method for influenza A viruses, which utilized an enzymatic cleavage-based approach, the Nextera XT DNA library preparation kit, for library preparation. The entire library preparation workflow was adapted for the Sentosa SX101, a liquid handling platform, to automate this labor-intensive step. As the enzymatic cleavage-based approach generates low coverage reads at both ends of the cleaved products, we corrected this loss of sequencing coverage at the termini by introducing modified primers during the targeted amplification step to generate full-length influenza A sequences with even coverage across the whole genome. Another challenge of targeted HTS is the risk of specimen-to-specimen cross-contamination during the library preparation step that results in the calling of false-positive minority variants. We included an in-run, negative system control to capture contamination reads that may be generated during the liquid handling procedures. The upper limits of 99.99% prediction intervals of the contamination rate were adopted as cut-off values of contamination reads. Here, 148 influenza A/H3N2 samples were sequenced using the HTS protocol and were compared against a Sanger-based sequencing method. Our data showed that the rate of specimen-to-specimen cross-contamination was highly significant in HTS.Entities:
Mesh:
Year: 2016 PMID: 27624998 PMCID: PMC5022032 DOI: 10.1038/srep33318
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Optimization of the DNA library generation by introducing decoy PCR primers to produce full-length influenza A/H3N2 sequences with even coverage.
(a) The existing protocol for Nextera XT library preparation on targeted amplicons. Low sequencing read depths were typically found at both 5′ and 3′ ends of the amplicons. (b) The direct integration of the transposon sequences to the Tuni-12 and Tuni-13 primer sequences and inclusion of decoy primers during the initial genome-wide PCR before the Nextera XT library preparation. This increases the read depths at both ends of the amplicons.
Figure 2Contamination noise analysis using in-run Negative System Controls (NSCs).
Only non-structural (NS) gene (segment 8) of the influenza A/H3N2 virus is used for illustration purpose. The depth of coverage for the contaminating NS gene found in the NSCs included in the 10 separate runs were recorded at every 150th nucleotide position starting from the first nucleotide of the NS gene onwards. As shown here, the number of contamination reads at the 451th nucleotide position was plotted against the number of total pass-filter reads in the NSC. The linear regression model at this nucleotide position of NS gene had the lowest R2 value (0.24) among all from all 8 genes (ranging from 0.24 to 0.70). The upper limits of 99.99% prediction intervals derived from the regression model of current 451th nucleotide position were adopted as cut-off value of contamination reads for nucleotide sequence located in the grey-highlighted region. All base calls with read depths below the cut-off values were considered as contamination reads and excluded from further analysis.
Summary of the total variants for each gene segment before and after background filtering and pairwise comparison between high-throughput sequencing- (HTS-) and Sanger-derived full-length sequences.
| Segment/Gene /Length | Background filtering | | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Before | After | Pairwise sequence comparison between HTS and Sanger | |||||||
| Total variants (minor + absolute) /No. of sequences | Total variants (minor + absolute) /No. of sequences | Total sequences compared | Total minor variants found by HTS after filtering /No. of HTS sequences | Total minor variants found by Sanger /No. of Sanger sequences | No. of fully concordant sequences | Total additional minor variants found by HTS /No. of sequences | Total homogeneous variants compared to Sanger method after filtering /No. of sequences | Total homogeneous variants correctly identified by HTS after re-visiting Sanger chromatogram/No. of sequences | |
| 1/PB2/2341nt | 2968/148 | 2511/130 | 129 | 403/129 | 133/129 | 53 (41%) | 270/75 | 0/0 | 0 |
| 2/PB1/2341nt | 3035/148 | 2532/130 | 130 | 377/128 | 132/130 | 60 (46%) | 247/65 | 9/8 | 2/2 |
| 3/PA/2233nt | 2956/148 | 2641/136 | 135 | 513/136 | 138/135 | 49 (36%) | 374/83 | 6/5 | 4/3 |
| 4/HA/1762nt | 2016/148 | 1860/138 | 138 | 175/136 | 140/137 | 105 (76%) | 37//30 | 3/3 | 1/1 |
| 5/NP/1566nt | 1949/148 | 1839142 | 142 | 172/138 | 147/142 | 112 (79%) | 32/26 | 7/5 | 4/2 |
| 6/NA/1467nt | 1748/148 | 1638/140 | 140 | 173/138 | 144/140 | 115 (82%) | 32/21 | 4/4 | 2/2 |
| 7/MP/1027nt | 1206/148 | 1155/142 | 142 | 163/142 | 146/142 | 126 (89%) | 18/15 | 1/1 | 1/1 |
| 8/NS/890nt | 1114/148 | 1069/143 | 143 | 175/140 | 150/143 | 117 (82%) | 28/22 | 4/4 | 1/1 |
Figure 3High-throughput sequencing results.
(a) Ideal sequencing coverage from influenza A/Singapore/H2009.334C/2009(H3N2) virus genome. Red, orange, yellow, green, blue, magenta, purple, and black plotted lines represent sequencing coverage (X-axis) and pass filter read depth (Y-axis) of the influenza A/H3N2 segment 1 (PB2 - polymerase basic 2, 2341 nt), segment 2 (PB1 - polymerase basic 1, 2341 nt), segment 3 (PA - polymerase acidic, 2233 nt), segment 4 (HA – hemagglutinin, 1762 nt), segment 5 (NP – nucleoprotein, 1566 nt), segment 6 (NA – neuraminidase, 1467 nt), segment 7 (MP - matrix protein, 1027 nt), and segment 8 (NS – nonstructural, 890 nt), respectively. (b) Suboptimal sequencing coverage for influenza A/Singapore/C2009.803bV/2009(H3N2) virus genome. Considerably higher read depths were observed in the 5′- and 3′-ends of segments 1 (red), 2 (orange), and 3 (yellow).
Figure 4(a) Nucleotide positions of additional minor variants detected by high-throughput sequencing (HTS) only but not by Sanger sequencing in a total of 148 influenza A/H3N2 genome sequences. (b) Population frequencies of additional minor variants detected by HTS only.