| Literature DB >> 27974548 |
Tengguo Li1, Elizabeth R Unger1, Dhwani Batra2, Mili Sheth2, Martin Steinau1, Jean Jasinski3, Jennifer Jones3, Mangalathu S Rajeevan4.
Abstract
We designed a universal human papillomavirus (HPV) typing assay based on target enrichment and whole-genome sequencing (eWGS). The RNA bait included 23,941 probes targeting 191 HPV types and 12 probes targeting beta-globin as a control. We used the Agilent SureSelect XT2 protocol for library preparation, Illumina HiSeq 2500 for sequencing, and CLC Genomics Workbench for sequence analysis. Mapping stringency for type assignment was determined based on 8 (6 HPV-positive and 2 HPV-negative) control samples. Using the optimal mapping conditions, types were assigned to 24 blinded samples. eWGS results were 100% concordant with Linear Array (LA) genotyping results for 9 plasmid samples and fully or partially concordant for 9 of the 15 cervical-vaginal samples, with 95.83% overall type-specific concordance for LA genotyping. eWGS identified 7 HPV types not included in the LA genotyping. Since this method does not involve degenerate primers targeting HPV genomic regions, PCR bias in genotype detection is minimized. With further refinements aimed at reducing cost and increasing throughput, this first application of eWGS for universal HPV typing could be a useful method to elucidate HPV epidemiology.Entities:
Keywords: HPV typing; broad-spectrum assay; target enrichment; whole-genome sequencing
Mesh:
Year: 2016 PMID: 27974548 PMCID: PMC5328449 DOI: 10.1128/JCM.02132-16
Source DB: PubMed Journal: J Clin Microbiol ISSN: 0095-1137 Impact factor: 5.948
FIG 1Number of reads (A) and mean base quality (Q) score of reads (B) passing the default filtering of Illumina BCl12fastq V1.8.4 software. Reads were restricted to 0 mismatches in 8-bp index reads.
FIG 2Performance of RNA baits for internal control human beta-globin gene based on number of reads (A), average coverage (B), and fraction of reference sequence covered by the reads (C).
Determination of cutoff to differentiate signal from noise in HPV whole-genome sequence data based on control samples
| Control sample | Mapping results for stringency condition | HPV type | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| L0.5-S0.8 | L0.5-S1 | L1-S1 | |||||||||
| No. of reads mapped | Avg coverage | Fraction of genome covered | No. of reads mapped | Avg coverage | Fraction of genome covered | No. of reads mapped | Avg coverage | Fraction of genome covered | Without cutoff | With cutoff | |
| Caski (100 ng) | 50,861,070 | 637,921.0 | 1.00 | 49,687,563 | 624,209.5 | 1.00 | 31,807,177 | 402,316.9 | 0.99 | 16 | 16 |
| 187 | 2.1 | 0.63 | 139 | 1.7 | 0.57 | 116 | 1.5 | 0.50 | 33 | ||
| Caski (10 ng) | 9,020,079 | 113,103.0 | 1.00 | 8,807,582 | 110,605.2 | 1.00 | 5,582,647 | 70,612.8 | 0.99 | 16 | 16 |
| 37 | 0.4 | 0.16 | 14 | 0.2 | 0.12 | 10 | 0.1 | 0.10 | 45 | ||
| SiHa (100 ng) | 134,664 | 1,679.0 | 1.00 | 130,955 | 1,638.0 | 1.00 | 87,797 | 1,110.5 | 0.98 | 16 | 16 |
| 75 | 0.9 | 0.22 | 65 | 0.8 | 0.19 | 47 | 0.6 | 0.12 | 18 | ||
| SiHa (10 ng) | 68,146 | 850.0 | 1.00 | 66,413 | 831.1 | 1.00 | 44,921 | 568.2 | 0.98 | 16 | 16 |
| 94 | 1.0 | 0.33 | 66 | 0.8 | 0.30 | 59 | 0.7 | 0.29 | 31 | ||
| HeLa (100 ng) | 314,969 | 3,953.7 | 0.67 | 298,994 | 3,760.7 | 0.67 | 183,032 | 2,329.5 | 0.64 | 18 | 18 |
| 1,339 | 16.0 | 1.00 | 1,100 | 13.8 | 1.00 | 659 | 8.3 | 0.90 | 16 | ||
| HeLa (10 ng) | 83,489 | 1,049.0 | 0.66 | 78,847 | 992.9 | 0.66 | 48,305 | 614.8 | 0.62 | 18 | 18 |
| 610 | 7.3 | 0.99 | 409 | 5.1 | 0.98 | 225 | 2.8 | 0.77 | 16 | ||
| H2O | 115 | 1.5 | 0.29 | 110 | 1.4 | 0.27 | 69 | 0.9 | 0.18 | 18 | Negative |
| 12 | 0.1 | 0.13 | 2 | 0.0 | 0.02 | 1 | 0.0 | 0.01 | 16 | ||
| Placenta | 1,800 | 21.3 | 1.00 | 1378 | 17.3 | 1.00 | 861 | 10.9 | 0.88 | 16 | Negative |
| 46 | 0.5 | 0.26 | 37 | 0.5 | 0.24 | 23 | 0.3 | 0.16 | 18 | ||
Three mapping stringencies (L0.5-S0.8, L0.5-S1, and L1-S1) were evaluated using combination of parameters L and S, which represent read length (0.5 or 1) and similarity score (0.8 or 1), respectively.
HPV types were determined following cutoff on number of mapped reads of ≥1,000, average coverage of ≥20, and fraction of genome covered of ≥0.5 under L1-S1 mapping stringency.
FIG 3Mapping results showing high specificity of RNA baits restricted to the HPV18 genome integrated into the HeLa genome. The reduced fraction of HPV genome, covered by the sequenced reads due to deletion of the central region of the HPV18 genome (the central region indicated within the two dashed vertical lines), compatible with a deletion due to integration, is shown reproducibly with 100 ng (A) and 10 ng (B) input DNA. The stringent L1-S1 mapping conditions result in small gaps in the consensus sequence due to mismatched reads. Mapping results schematically aligned to HPV genome with the location of early and late region genes (C).
HPV genotype determination in blinded samples based on target enrichment and whole-genome sequencing
| Sample no. | Sample type | No. of reads mapped | Avg coverage | Fraction of genome covered (≥0.5) | HPV type | Concordance | |
|---|---|---|---|---|---|---|---|
| WGS result | LA result | ||||||
| 1 | Plasmid | 282,773 | 3,598.5 | 0.99 | Yes | ||
| 2 | Plasmid | 293,528 | 3,751.6 | 1.00 | Yes | ||
| 3 | Plasmid | 210,381 | 2,659.0 | 1.00 | Yes | ||
| 4 | Plasmid | 190,903 | 2,413.7 | 0.99 | Yes | ||
| 5 | Plasmid | 237,806 | 2,994.3 | 1.00 | Yes | ||
| 6 | Plasmid | 268,460 | 3,357.4 | 1.00 | Yes | ||
| 7 | Plasmid | 95,100 | 1,210.4 | 1.00 | Yes | ||
| 8 | Plasmid | 169,355 | 2,135.4 | 1.00 | Yes | ||
| 17 | Cervicovaginal | 16,120 | 208.7 | 0.16 | 34 | Yes, partial | |
| 1,736 | 379 | 0.85 | |||||
| 81 | |||||||
| 18 | Cervicovaginal | 7,873 | 100.9 | 0.99 | Yes | ||
| 10,820 | 139.5 | 0.97 | |||||
| 8,012 | 101.4 | 0.91 | |||||
| 3,967 | 49.4 | 0.99 | |||||
| 19 | Cervicovaginal | 667,344 | 8,554.6 | 0.90 | Yes | ||
| 675,774 | 8,535.7 | 0.97 | |||||
| 680,615 | 8,425.5 | 0.91 | |||||
| 345,249 | 4,372.5 | 0.82 | |||||
| 170,609 | 2,105.2 | 0.53 | |||||
| 36,981 | 472.7 | 1.00 | |||||
| 32,800 | 419.2 | 0.72 | |||||
| 13,844 | 176.5 | 0.91 | |||||
| 20 | Cervicovaginal | HPV− | 72, 35, 52, 53, 54, 62, 81 | No | |||
| 21 | Cervicovaginal | Yes | |||||
| 22 | Cervicovaginal | HPV− | 6, 53, 56, 62, 70 | No | |||
| 23 | Cervicovaginal | 135,138 | 1,711.5 | 0.73 | Yes, partial | ||
| 110,107 | 1,392.2 | 0.99 | |||||
| 95,442 | 1,193.2 | 0.99 | |||||
| 73,386 | 940.7 | 0.84 | |||||
| 49,125 | 638.0 | 0.87 | |||||
| 36,101 | 452.7 | 0.98 | |||||
| 29,721 | 375.9 | 0.97 | |||||
| 13,891 | 171.4 | 0.73 | |||||
| 9,380 | 119.9 | 0.99 | |||||
| 3,091 | 39.5 | 0.99 | |||||
| 2,121 | 27.0 | 0.85 | |||||
| 45, 51, 61, 89 | |||||||
| 24 | Cervicovaginal | 2503,879 | 31,916.9 | 0.99 | Yes, partial | ||
| 2221,727 | 27,974.4 | 1.00 | |||||
| 1054,024 | 13,048.1 | 0.99 | |||||
| 598,662 | 7,506.7 | 0.90 | |||||
| 266,588 | 3,318.7 | 0.98 | |||||
| 213,130 | 2,692.1 | 0.95 | |||||
| 50,171 | 628 | 0.66 | |||||
| 40,905 | 523.9 | 0.99 | |||||
| 31,221 | 390.3 | 0.98 | |||||
| 53, 68b | |||||||
| 25 | Cervicovaginal | HPV− | 66, 18, 31 | No | |||
| 26 | Cervicovaginal | 10277,696 | 129,998.7 | 1.00 | Yes, partial | ||
| 428,009 | 5,411.7 | 0.97 | |||||
| 184,266 | 2,310.5 | 0.86 | |||||
| 127,862 | 1,609.9 | 0.99 | |||||
| 118,695 | 1,490.0 | 0.83 | |||||
| 90,888 | 1,148.0 | 0.99 | |||||
| 3,722 | 46.0 | 0.66 | |||||
| 2,610 | 33.3 | 0.96 | |||||
| 72, 83 | |||||||
| 27 | Cervicovaginal | 26, 42, 58, 83, 84 | No | ||||
| 28 | Cervicovaginal | 9,041 | 115.1 | 0.99 | 35, 53 | No | |
| 29 | Cervicovaginal | 23,813 | 295.1 | 0.76 | 40, 54, 66, 84, 89 | No | |
| 30 | Cervicovaginal | 2102,063 | 26,836.0 | 1.00 | Yes | ||
| 1743,872 | 22,288.8 | 0.99 | |||||
| 81,139 | 1,039.2 | 0.85 | |||||
| 49,403 | 617.8 | 0.62 | |||||
| 12,102 | 152.6 | 0.96 | |||||
| 31 | Cervicovaginal | Yes | |||||
| 32 | Plasmid | 270,183 | 3,417.4 | 1.00 | Yes | ||
Sample numbers 1 to 8 were multiplexed with control samples in pool 1, and samples 17 to 32 were multiplexed in pool 2. Sequence information is given only for those that passed the signal/noise cutoff for HPV type determination.
Asterisks indicate HPV types not included in LA but detected by WGS. Boldface indicates types detected by both assays, while italics indicates types not included in LA assay.
HPV concordance based on WGS and LA results.
HPV64 has been reclassified as a subtype of HPV34. The only reference sequence available for HPV64 is an L1 fragment. On post hoc assessment, results for this sample are assigned to HPV64 based on reads mapped to the L1 fragment with good coverage and reads mapping to HPV34 with low genome coverage.
Performance of different RNA bait evaluation metrics based on WGS mapping results under L1-S1 from plasmid samples
| Sample no. | HPV type | Bait design coverage | HPV genome coverage by mapped reads | HPV reads mapped to predicted type | Uniformity of coverage |
|---|---|---|---|---|---|
| 1 | 45 | 100 | 99.0 | 99.8 | 96.3 |
| 2 | 58 | 100 | 100 | 99.7 | 96.3 |
| 3 | 31 | 100 | 100 | 99.7 | 96.3 |
| 4 | 33 | 100 | 99.2 | 99.2 | 96.3 |
| 5 | 52 | 100 | 100 | 99.7 | 96.3 |
| 6 | 6 | 99.2 | 100 | 99.8 | 96.3 |
| 7 | 18 | 100 | 100 | 100 | 96.3 |
| 8 | 11 | 100 | 100 | 99.5 | 92.6 |
| 32 | 16 | 100 | 100 | 99.8 | 96.3 |
| Mean | 99.9 | 99.8 | 99.6 | 95.9 |
HPV type determined by eWGS for the corresponding sample numbers (Table 2).
Proportion of the HPV genome covered by the bait design criteria.
Proportion of the HPV reference genome covered by the mapped reads.
Proportion of reads that mapped to HPV predicted type compared to the total HPV reads.
Uniformity of coverage across the genome was calculated as the percentage of bins with coverage within the average read depth ± 2 SD for all bins (see Materials and Methods for details).
FIG 4Laboratory workflow for HPV genotyping following RNA bait-based target enrichment and whole-genome sequencing.