| Literature DB >> 26645052 |
Michael Cullen1, Joseph F Boland1, Mark Schiffman2, Xijun Zhang1, Nicolas Wentzensen2, Qi Yang1, Zigui Chen3, Kai Yu2, Jason Mitchell1, David Roberson1, Sara Bass1, Laurie Burdette1, Moara Machado4, Sarangan Ravichandran5, Brian Luke5, Mitchell J Machiela2, Mark Andersen6, Matt Osentoski6, Michael Laptewicz6, Sholom Wacholder2, Ashlie Feldman1, Tina Raine-Bennett7, Thomas Lorey7, Philip E Castle8, Meredith Yeager1, Robert D Burk9, Lisa Mirabello2.
Abstract
For unknown reasons, there is huge variability in risk conferred by different HPV types and, remarkably, strong differences even between closely related variant lineages within each type. HPV16 is a uniquely powerful carcinogenic type, causing approximately half of cervical cancer and most other HPV-related cancers. To permit the large-scale study of HPV genome variability and precancer/cancer, starting with HPV16 and cervical cancer, we developed a high-throughput next-generation sequencing (NGS) whole-genome method. We designed a custom HPV16 AmpliSeq™ panel that generated 47 overlapping amplicons covering 99% of the genome sequenced on the Ion Torrent Proton platform. After validating with Sanger, the current "gold standard" of sequencing, in 89 specimens with concordance of 99.9%, we used our NGS method and custom annotation pipeline to sequence 796 HPV16-positive exfoliated cervical cell specimens. The median completion rate per sample was 98.0%. Our method enabled us to discover novel SNPs, large contiguous deletions suggestive of viral integration (OR of 27.3, 95% CI 3.3-222, P=0.002), and the sensitive detection of variant lineage coinfections. This method represents an innovative high-throughput, ultra-deep coverage technique for HPV genomic sequencing, which, in turn, enables the investigation of the role of genetic variation in HPV epidemiology and carcinogenesis.Entities:
Keywords: HPV epidemiology; HPV genomics; HPV16
Year: 2015 PMID: 26645052 PMCID: PMC4669577 DOI: 10.1016/j.pvr.2015.05.004
Source DB: PubMed Journal: Papillomavirus Res ISSN: 2405-8521
Fig. 1HPV16 variant lineage co-infection identification. Integrative Genomics Viewer (IGV) [37] screenshot showing an example of a European (EUR; A) lineage and non-European (nonE; D3) lineage coinfection. Sequencing reads (forward reads in red and reverse in blue) show consistent variants across four nucleotide sites for both the European and non-European variant lineages. Multiple sites across the HPV16 genome confirm patterns seen in the smaller IGV window. The HPV16 reference nucleotide position is shown first for the variants highlighted in each callout, and the reference sequence is shown along the bottom of the window below the sequence reads. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 2HPV16 sequence depth for 796 samples. (a) Heat map illustrates the sequence depth across 48 overlapping amplicons (columns 1–48) for 796 samples (rows 1–796) displaying sequence depths >100× (green cells), depths of 15× (white cells) and depths <2× (red cells). Asterisks highlight four poorly performing amplicons. High quality sequence for 638 samples far exceeded 25× coverage (group I). Twenty-six samples contained large central deletions (group II). Fifty-five samples yielded high quality and depth sequence data that contained some specific amplicon dropout (group III). Seventy-seven samples performed poorly (group IV). The HPV16 reference map aligns each gene to the corresponding amplicon(s). The exaggerated overlap of adjacent HPV16 genes and regions (early genes: E6, E7, E1, E2, E4, E5; late genes: L2, L1; the upstream regulatory region, URR; non-coding region, NC) in the map reflects the overlapping design of the amplicons. (b) Heat map of sequence depths >500× (blue cells) and less than 500× (white cells). (c) The number of samples (y-axis) that exceeded 50× (blue), 100× (red) and 200× (green) sequence coverage (x-axis) for one amplicon (2% of genome) up to 48 amplicons (100% of the genome). (d) Summary statistics for the number of high quality group I plus group III samples (n=693) that exceeded 80, 85, 90 and 95 percent sequence coverage at depths of greater than 25×, 50×, 100×, 200× and 500×. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Annotated HPV16 genome SNPs in independent specimens detected by NGS.
| 477 | 53 | 11.1 | 38 | 1 | 71.7 | 15 | 10 | 34 | 25 | 1 | 0 | 3 | 3 | 73.7% | 0.3% | 4.5% | |
| 297 | 19 | 6.4 | 12 | 0 | 63.2 | 13 | 7 | 6 | 5 | 0 | 0 | 0 | 0 | 83.3% | 0.2% | 0.8% | |
| 1949 | 168 | 8.6 | 112 | 1 | 66.7 | 96 | 57 | 72 | 55 | 0 | 0 | 1 | 1 | 76.7% | 1.0% | 22.4% | |
| 1098 | 122 | 11.1 | 77 | 0 | 63.1 | 40 | 22 | 84 | 55 | 0 | 0 | 1 | 1 | 65.9% | 13.6% | 32.4% | |
| 288 | 56 | 19.4 | 31 | 3 | 55.4 | 28 | 12 | 21 | 17 | 0 | 0 | 2 | 2 | 82.6% | 8.5% | 18.6% | |
| 252 | 40 | 15.9 | 27 | 1 | 67.5 | 23 | 15 | 19 | 12 | 0 | 0 | 0 | 0 | 63.2% | 0.0% | 4.7% | |
| 1422 | 253 | 17.8 | 171 | 1 | 67.6 | 115 | 66 | 141 | 105 | 0 | 0 | 1 | 1 | 74.6% | 1.5% | 9.1% | |
| 1596 | 156 | 9.8 | 121 | 0 | 77.6 | 102 | 75 | 58 | 48 | 0 | 0 | 1 | 1 | 83.1% | 0.5% | 6.5% | |
| 831 | 146 | 17.6 | 83 | 3 | 56.8 | – | – | – | – | – | – | – | – | – | 5.5% | 2.5% | |
| 44 | 4 | 9.1 | 1 | 0 | 25.0 | – | – | – | – | – | – | – | – | – | – | – | |
Nonsyn, nonsynonymous; freq, frequency; early genes: E6, E7, E1, E2, E4 (overlaps with the E2 gene region), E5; late genes: L2, L1; upstream regulatory region: URR.
Total number of variable positions, some SNPs had multiple variable alleles that led to multiple changes; † based on four E2 binding sites (E2BS) in the URR.
SNPs were considered “novel” if they were not present in the 62 reference HPV16 Sanger sequences [22].
Frequency of CpG site changes are shown for women with a HPV16 European variant lineage compared to the HPV16 European prototype reference sequence.
HPV16 variant lineage risk associations.
| Control | Non-EUR, B/C/D | 23 | EUR, A1 | 145 | 1.0 | ||
| CIN3 | 69 | 314 | 1.4 | 0.8–2.3 | 0.211 | ||
| Cancer | 23 | 34 | 4.3 | 2.1–8.5 | 4.0×10−5 |
N, number of women in each lineage.
OR, odds ratio; 95% CI, 95% confidence intervals.
CIN3, cervical intraepithelial neoplasia grade 3; CIN3+, cervical intraepithelial neoplasia grade 3 and cancer; EUR, European.
Fig. 3Coverage depth by nucleotide position for three of the 26 HPV16 samples exhibiting large central deletions. Three samples with a large central deletion were randomly chosen to illustrate the coverage depth pattern. The three sequence coverage plots (shown in blue, green and red) are over-laid upon the ORFs (early genes: E6, E7, E1, E2/E4, E5; late genes: L2, L1) and the upstream regulatory region (URR) and corresponding nucleotide position of the HPV16 genome. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)