Literature DB >> 23824673

Multiplex Identification of Human Papillomavirus 16 DNA Integration Sites in Cervical Carcinomas.

Bo Xu1, Sasithorn Chotewutmontri, Stephan Wolf, Ursula Klos, Martina Schmitz, Matthias Dürst, Elisabeth Schwarz.   

Abstract

Cervical cancer is caused by high-risk human papillomaviruses (HPV), in more than half of the worldwide cases by HPV16. Viral DNA integration into the host genome is a frequent mutation in cervical carcinogenesis. Because integration occurs into different genomic locations, it creates unique viral-cellular DNA junctions in every single case. This singularity complicates the precise identification of HPV integration sites enormously. We report here the development of a novel multiplex strategy for sequence determination of HPV16 DNA integration sites. It includes DNA fragmentation and adapter tagging, PCR enrichment of the HPV16 early region, Illumina next-generation sequencing, data processing, and validation of candidate integration sites by junction-PCR. This strategy was performed with 51 cervical cancer samples (47 primary tumors and 4 cell lines). Altogether 75 HPV16 integration sites (3'-junctions) were identified and assigned to the individual samples. By comparing the DNA junctions with the presence of viral oncogene fusion transcripts, 44 tumors could be classified into four groups: Tumors with one transcriptionally active HPV16 integrate (n = 12), tumors with transcribed and silent DNA junctions (n = 8), tumors carrying episomal HPV16 DNA (n = 10), and tumors with one to six DNA junctions, but without fusion transcripts (n = 14). The 3'-breakpoints of integrated HPV16 DNA show a statistically significant (p<0.05) preferential distribution within the early region segment upstream of the major splice acceptor underscoring the importance of deregulated viral oncogene expression for carcinogenesis. Half of the mapped HPV16 integration sites target cellular genes pointing to a direct influence of HPV integration on host genes (insertional mutagenesis). In summary, the multiplex strategy for HPV16 integration site determination worked very efficiently. It will open new avenues for comprehensive mapping of HPV integration sites and for the possible use of HPV integration sites as individualized biomarkers after cancer treatment of patients for the early diagnosis of residual and recurrent disease.

Entities:  

Mesh:

Substances:

Year:  2013        PMID: 23824673      PMCID: PMC3688939          DOI: 10.1371/journal.pone.0066693

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Persistent infection with carcinogenic human papillomavirus (HPV) is the essential basis for development of cervical cancer [1], one of the most common cancers in women worldwide [2], [3]. From the twelve mucosotropic high-risk HPV types (hr-HPV) classified as “carcinogenic to humans” [4], HPV16 is by far the most prevalent and most carcinogenic type responsible for more than 50% of all cervical cancer cases worldwide, followed by HPV18 (about 20% of cervical cancer cases) and less prevalent hr-HPV types [5]–[7]. Most cervical hr-HPV infections are transient and cleared within 1–2 years. Long-term viral persistence is established in about 10% of the infection cases, and only some of the persistent hr-HPV infections will progress to precancer lesions and eventually to cancer [1]. HPV16 is present in the human population in many different molecular variants, which have been grouped into five phylogenetic clusters based on their original geographic distribution [8]. HPV16 variants differ in their carcinogenic potential and other transformation-linked properties [9], [10]. The viral oncogenes E6 and E7 become constitutive components of the host cells by persistent hr-HPV infection. Their protein products inactivate the major cellular tumor suppressors p53 and pRB, and interact in addition with a plethora of other cellular proteins [11]–[13]. The HPV life cycle and viral gene expression patterns are severely disturbed in the course of cervical carcinogenesis [14]. Deregulated constitutive expression of E6 and E7 is the key event for malignant progression, combined with additional alterations of viral and cellular genes and pathways [15], [16]. Integration of hr-HPV DNA into the host genome can be a driver mutation in cervical carcinogenesis, associated with progression and invasiveness [17], [18]. The prevalence of integrated hr-HPV DNA increases substantially with the severity of the lesions, reaching 100% in HPV18-induced cervical cancer cases [19]–[22]. A subset of HPV16-positive invasive cervical carcinomas, however, maintains viral DNA only as episomes indicating that integration-associated and episome-associated pathways of HPV16-induced cervical carcinogenesis might exist [23], [24]. Integration converts the circular HPV genome into a linear truncated DNA, in which the upstream regulatory region (URR) and the E6/E7 oncogenes are always retained intact (Figure 1A) [22], [25]. Besides the integrated monomeric forms, head-to-tail concatemers of full-length HPV genomes flanked by truncated copies also exist, exemplified by the cervical cancer cell line CaSki [26], [27]. Transcription initiated at the HPV early promoter traverses the 3′ integration site into the flanking cellular sequences, giving rise to spliced viral-cellular fusion transcripts that are important for constitutive deregulated expression of the E6/E7 oncogenes (Figure 1B and 1C) [21], [22], [28]–[31].
Figure 1

Genomic structure and transcription of episomal and integrated HPV16 DNA.

(A) The circular HPV16 genome becomes linearized and inserted into the host cell genome upon integration. The breakpoints of integrated HPV16 DNA can be located anywhere in the L2-L1 region (5′ breakpoint) and E1-E2-E5 region (3′ breakpoint), respectively. The breakpoint regions are indicated by dotted underlines. The circular HPV16 genome is reproduced with slight modifications with permission, from Doorbar, (2006), (Clinical Science), (110), (525–541). © the Biochemical Society. (B) In case of episomal HPV16 DNA, early transcription is initiated at the early promoter P97 and terminated at the early poly-A signal (PAE at pos. 4215). The early transcript shown contains two exons (e) with ORFs for E6, E7, E1?E4 and E5. An intron (i) of 2477 nucleotides is removed by splicing at the indicated donor and acceptor positions. Amplification of HPV16 oncogene transcripts by the APOT assay gives rise to a constant-size RT-PCR amplicon of ∼1 kb [21]. The APOT primers are indicated by the arrows. (C) Early transcription from integrated HPV16 DNA will lead to HPV16-cellular fusion transcripts, because the viral PAE signal is missing and instead a cellular poly-A signal (PA) is adopted. If the 3′ breakpoint is located upstream of the splice acceptor at position 3358, an alternative cellular splice acceptor (Ac) will be used, and the HPV16-cellular DNA junction sequence (red bar) will be spliced out as part of a viral-cellular intron (upper part). If the 3′ breakpoint is located downstream of the splice acceptor 3358, the HPV16-cellular DNA junction sequence (blue bar) will remain as part of a viral-cellular exon, and the DNA and RNA junction sequences are colinear (lower part). The HPV16-cellular fusion transcripts are amplified in APOT assays as RT-PCR products that are shorter or longer than the ∼1-kb amplicon derived from episomal HPV16 [21].

Genomic structure and transcription of episomal and integrated HPV16 DNA.

(A) The circular HPV16 genome becomes linearized and inserted into the host cell genome upon integration. The breakpoints of integrated HPV16 DNA can be located anywhere in the L2-L1 region (5′ breakpoint) and E1-E2-E5 region (3′ breakpoint), respectively. The breakpoint regions are indicated by dotted underlines. The circular HPV16 genome is reproduced with slight modifications with permission, from Doorbar, (2006), (Clinical Science), (110), (525–541). © the Biochemical Society. (B) In case of episomal HPV16 DNA, early transcription is initiated at the early promoter P97 and terminated at the early poly-A signal (PAE at pos. 4215). The early transcript shown contains two exons (e) with ORFs for E6, E7, E1?E4 and E5. An intron (i) of 2477 nucleotides is removed by splicing at the indicated donor and acceptor positions. Amplification of HPV16 oncogene transcripts by the APOT assay gives rise to a constant-size RT-PCR amplicon of ∼1 kb [21]. The APOT primers are indicated by the arrows. (C) Early transcription from integrated HPV16 DNA will lead to HPV16-cellular fusion transcripts, because the viral PAE signal is missing and instead a cellular poly-A signal (PA) is adopted. If the 3′ breakpoint is located upstream of the splice acceptor at position 3358, an alternative cellular splice acceptor (Ac) will be used, and the HPV16-cellular DNA junction sequence (red bar) will be spliced out as part of a viral-cellular intron (upper part). If the 3′ breakpoint is located downstream of the splice acceptor 3358, the HPV16-cellular DNA junction sequence (blue bar) will remain as part of a viral-cellular exon, and the DNA and RNA junction sequences are colinear (lower part). The HPV16-cellular fusion transcripts are amplified in APOT assays as RT-PCR products that are shorter or longer than the ∼1-kb amplicon derived from episomal HPV16 [21]. Integrated HPV DNA usually shows disruption or complete deletion of the E1 or E2 gene, with a consequence of functional inactivation. The E1 gene encodes the HPV-specific helicase essential for initiation of viral DNA replication. The E2 gene encodes a multifunctional regulatory protein involved in regulation of viral transcription, initiation of viral DNA replication and maintenance of the viral DNA episome. Loss of the E1/E2 expression abrogates the E2-mediated repression of E6/E7 transcription from integrated HPV DNA [32], increases the efficiency of HPV-induced immortalization of primary human keratinocytes [33], and is associated with poor prognosis of cervical cancer as well as low disease-free survival rate [34], [35]. HPV DNA integration occurs into various regions of the human genome, with certain preferences for transcribed regions and common fragile sites [36]–[38]. Many HPV integration sites are located within known or predicted cellular genes [28]. Inactivation of tumor suppressor genes or activation of proto-oncogenes might be a direct consequence of HPV DNA integration. Examples for such scenarios have been reported, including the MYC proto-oncogene [36]–[42] and the potential tumor suppressor gene ZBTB7C (APM-1) [43] and CASZ1 [44]. It is a matter of ongoing debate whether and to which extent HPV-induced insertional mutagenesis of cellular genes contributes to cervical carcinogenesis [28], [30], [38], [45]. Sequence determination of viral-cellular junctions gives direct proof of HPV integration and allows precise localization of the chromosomal target sites. However, this is also a difficult task because the integration breakpoints of both the viral and cellular genome are different in all samples. In the past, junction sequences were determined in clones isolated from genomic DNA or cDNA libraries [22], [26], [29]. Later, PCR-based approaches for DNA junction analysis were developed including restriction-site PCR (RS-PCR) [37], [46], [47], detection of integrated papillomavirus sequences by ligation-mediated PCR (DIPS-PCR) [48], and the restriction-ligation-inverse PCR (rli-PCR) [49]. But these PCR methods are quite laborious and not sensitive enough, therefore limiting the broad application of them. For RNA junction analysis, the amplification of papillomavirus oncogene transcripts (APOT) RT-PCR assay can distinguish episome-derived viral transcripts (Figure 1B) from integrate-derived viral-cellular fusion transcripts (Figure 1C) [21], [30]. Until now, viral-cellular junctions from more than 300 cervical precancer and cancer samples have been characterized [28], [38], [50]. Development of more efficient methods would facilitate a comprehensive mapping of HPV integration sites to gain deeper insight into cervical carcinogenesis. Furthermore, efficient determination of HPV integration sites would allow their use as individualized markers in cervical cancer screening and in the follow-up of patients for the early detection of recurrent disease and metastasis. Here we report the design and application of an innovative strategy for the simultaneous, nucleotide-level determination of HPV16 DNA integration sites in multiple cervical cancer samples. The multiplex strategy takes advantage of novel methods for sample preparation and next-generation sequencing (NGS). In a pooled mixture of DNA samples from about 50 primary cervical carcinomas and carcinoma-derived cell lines, more than 70 HPV16-cellular junction sequences could be identified in one NGS experiment and assigned by junction-specific PCR to the individual samples. The power of NGS was concomitantly employed to determine the HPV16 E6 variant sequences for all carcinoma samples. The efficient performance opens new avenues in the future for HPV16 integration site analysis in large numbers of cancer and precancer samples.

Results

Identification of HPV DNA integration sites in the human genome is an elaborate task due to the unique combination of viral and cellular breakpoints in every single case. Exploiting the high capacity of NGS, we have designed and applied a multiplex strategy for HPV16 integration site determination, named TEN16 for “Tagging, Enrichment and Next-generation sequencing of HPV16”.

TEN16 Assay Design

The TEN16 strategy is a multistep procedure including as essential steps (i) Nextera™ in vitro transposition for fragmentation and adapter tagging of the genomic DNA of HPV16-positive tumor samples; (ii) HPV16 DNA enrichment by multiplex PCR with HPV16 forward primers and barcoded Nextera adapter as reverse primer; (iii) Illumina NGS; (iv) data processing for sequence sorting and mapping; and (v) validation of HPV16-cellular junction sequences by junction-specific PCR. The complete workflow is outlined in Figure 2A.
Figure 2

The TEN16 strategy for simultaneous determination of HPV16 DNA integration sites in multiple samples.

(A) The TEN16 workflow. Details of the individual steps are given in parentheses. (B) Structure of DNA fragments produced by Nextera reaction and blocking of the DNA 3′-ends. DNA fragmented by Nextera transposition carries a 19-nt universal adapter (striated arrow; referred to as Nextera adapter) covalently coupled to the 5′-end of each strand, whereas a 9-nt gap separates the 3′-end from the complementary oligonucleotide (open arrow) [52]. In a PCR using Nextera DNA fragments as templates, DNA polymerase will repair the 9-nt gap. This will lead to whole-genome amplification if the Nextera adapter is used as a primer. For this reason, a blocking reaction with ddNTP was conceived to make the 3′-OH groups inaccessible for gap repair to reduce the amplification primed by Nextera adapter alone. (C) Enrichment of the HPV16 early region by multiplex PCR. Sixteen HPV16 forward primers assembled in two mixtures (HPM-A and -B) are used to cover the E1-E2-E5 region, where the viral 3′ breakpoints can occur. The approximate primer locations are marked by arrows. (D) Categories of HiSeq2000 read pairs after sorting and mapping. In category 1 both reads of the same pair contain HPV16 sequences (open arrow) only, and in category 2 cellular sequences (filled arrow) only. Category 3 contains pairs with one HPV16 read and a cellular mate. Category 4 contains pairs with one junction read (combined arrow) plus an HPV16, a cellular, or a junction mate. Categories 3 and 4 are important for HPV16 integration site determination. (E) Validation of potential HPV16-cellular junctions by junction-specific PCR. The five DNA samples sharing the same barcode are used individually as PCR templates, and only one of them should be positive for the junction being tested. Shown here is the junction-PCR for the five DNA samples of barcode 08 and the positive result for 3256_DJ1.

The TEN16 strategy for simultaneous determination of HPV16 DNA integration sites in multiple samples.

(A) The TEN16 workflow. Details of the individual steps are given in parentheses. (B) Structure of DNA fragments produced by Nextera reaction and blocking of the DNA 3′-ends. DNA fragmented by Nextera transposition carries a 19-nt universal adapter (striated arrow; referred to as Nextera adapter) covalently coupled to the 5′-end of each strand, whereas a 9-nt gap separates the 3′-end from the complementary oligonucleotide (open arrow) [52]. In a PCR using Nextera DNA fragments as templates, DNA polymerase will repair the 9-nt gap. This will lead to whole-genome amplification if the Nextera adapter is used as a primer. For this reason, a blocking reaction with ddNTP was conceived to make the 3′-OH groups inaccessible for gap repair to reduce the amplification primed by Nextera adapter alone. (C) Enrichment of the HPV16 early region by multiplex PCR. Sixteen HPV16 forward primers assembled in two mixtures (HPM-A and -B) are used to cover the E1-E2-E5 region, where the viral 3′ breakpoints can occur. The approximate primer locations are marked by arrows. (D) Categories of HiSeq2000 read pairs after sorting and mapping. In category 1 both reads of the same pair contain HPV16 sequences (open arrow) only, and in category 2 cellular sequences (filled arrow) only. Category 3 contains pairs with one HPV16 read and a cellular mate. Category 4 contains pairs with one junction read (combined arrow) plus an HPV16, a cellular, or a junction mate. Categories 3 and 4 are important for HPV16 integration site determination. (E) Validation of potential HPV16-cellular junctions by junction-specific PCR. The five DNA samples sharing the same barcode are used individually as PCR templates, and only one of them should be positive for the junction being tested. Shown here is the junction-PCR for the five DNA samples of barcode 08 and the positive result for 3256_DJ1. Nextera™ in vitro transposition is a novel sample preparation approach for whole-genome NGS [51]. In a single reaction, the sample DNA is randomly fragmented, and the fragments are universally tagged with a specific Nextera adapter sequence (the transposon end) at the 5′-ends (Figure 2B). This feature makes the Nextera technique especially suitable for PCR enrichment of HPV16-cellular junction sequences from the complex human genome. The transposition reaction creates double-stranded DNA fragments with transposon-tagged 5′-ends and free 3′-ends upstream of a 9-nt single-stranded gap [52]. To facilitate HPV16 DNA enrichment and to suppress the unwanted PCR amplification of pure genomic DNA, the standard Nextera workflow was modified by blocking the free 3′-ends with ddNTP after DNA fragmentation/tagging and transposase removal (Figure 2B). The Illumina HiSeq2000 NGS system was employed for DNA sequencing using the paired-end method with read length of 2×100 nucleotides (nt). Before sequencing, the enrichment of HPV16-containing sequences from the whole genomic DNA was achieved by multiplex PCR reactions. The focus was put on the functionally important and potentially transcribed 3′ breakpoints of integrated HPV16 DNA. To detect all possible 3′ breakpoints, 16 forward primers were selected which completely cover the HPV16 E1-E2-E5 region of 3.2 kb in length (Figure 2C). The primer sequences are given in Table S1. With an estimated yield per lane of 100 million read pairs by HiSeq2000 sequencing, it was calculated that the simultaneous analysis of 50 samples in one TEN16 experiment would result in a sequencing depth of 125,000 read pairs per sample per HPV16 primer. This should be sensitive enough to determine viral-cellular junctions against the background of pure HPV16 sequences. To simplify the whole experimental procedure and to reduce the per-sample cost of Nextera, 50 DNA samples were partitioned into ten pools (five samples each) before the Nextera reaction and then processed in parallel. Multiplex PCR for HPV16 DNA enrichment was performed for each pool in two reactions with two HPV16 forward primer mixtures HPM-A and -B (eight primers each), respectively. The Nextera adapter was used as reverse primer equipped at the 5′-end with a unique 5-nt barcode for each pool (Table S2), necessary for later sequence sorting. Products of all PCR reactions were mixed together and prepared for HiSeq2000 sequencing.

TEN16 Performance of HPV16 Integration Site Determination

In the TEN16 pilot study, 51 DNA samples including 47 fresh-frozen cervical carcinomas and four cervical cancer cell lines (SiHa, CaSki, MRI-H186, MRI-H196) were investigated. Fifty of them were premixed into ten pools as described above. CaSki DNA was prepared separately to assess the sensitivity of TEN16, because every CaSki cell harbors as much as 600 copies of integrated full-length, concatemeric HPV16 DNA [26], [53]. A total of 106.3 million sequence read pairs were generated from a single lane of HiSeq2000. Almost 72 million read pairs were identifiable and sorted by barcodes (no mismatch allowed in the barcode and primer sequences). Over 70 million read pairs represented the TEN16 DNA library, and the rest belonged to the HPV16 E6 variant analysis (described later). After mapping with HPV16 and human reference sequences, the sorted TEN16 read pairs were split into four categories (Figure 2D). On average per barcode, 53% contained pure HPV16 sequences (category 1), 42% contained pure cellular sequences (category 2), 0.05% were read pairs flanking HPV16-cellular junctions (category 3), 0.25% contained a chimeric sequence in at least one read (category 4), and 4.7% were unmapped sequences. The data processing gave rise to about 200,000 read pairs in categories 3 and 4. Visual inspection of the sequences revealed that many potential junctions were represented by only one or a few read pairs. Because it was unfeasible to validate all the candidates by junction-PCR, a cutoff value of 15 read pairs per junction was chosen. With this criterion, 76 potential junctions were selected as the most promising candidates for validation. Among them, 67 junctions were confirmed to be authentic by junction-PCR and Sanger sequencing, and were assigned to the individual tumors or cell lines (Figure 2E). Nine junctions could not be assigned to any sample in the respective barcode. They were further analyzed individually with all DNA pools to check for the possibility of wrong barcode assignment. Since all reactions were negative (data not shown), it was concluded that the nine junctions are most likely PCR artifacts during HPV16 DNA enrichment. For CaSki, three novel junctions were identified by TEN16 with >15 read pairs each. The previously known junction [54] was detected by searching again through the TEN16 sequence library, but only with three read pairs (CS_DJ1 in Table 1). Additional junctions were discovered later while comparing DNA junction/RNA junction sequences (see next chapter). Taken together, 75 HPV16-cellular DNA junctions were identified and validated in the TEN16 study (Table 1 and Table S3). Distribution of the read pair numbers for all junctions is shown in Figure S1.
Table 1

HPV16-cellular DNA junctions validated by junction-PCR.

HPV16Cellular SequenceCellular GeneAccession
DJ1) Samplebreakpoint (ORF)breakpointmapstrandu/r2) namet/d3) orientation4) Number
2319_DJ3a) T2319910 (E1)106975451p36.22u CASZ1 tsameHE984526
2319_DJ2 a) T23193115 (E2)106992821p36.22u CASZ1 tsameHE984525
2319_DJ1 a) T23194338 (L2)107074311p36.22+u CASZ1 toppositeHE984524
0892_DJ2 b) T8922886 (E2)278691532p23.2r (AluSx) GPN1 toppositeHE984511
0892_DJ1T8922970 (E2)278693522p23.2+u GPN1 tsameHE984510
3719_DJ1T37192823 (E2)348968212p22.3r (L1M1_5) no gene HE984545
2882_DJ1T28823596 (E2)1465120492q22.3u no gene HE984537
2085_DJ1T20852774 (E1/E2)1491579692q23.1+u MBD5 tsameHE984518
1875_DJ1T18753010 (E2)2018237202q33.1+u ORC2 toppositeHE984516
0841_DJ1T8412412 (E1)2061034922q33.3r (AluSq) PARD3B toppositeHE984505
3987_DJ1T39873489 (E2)2122514502q34+u ERBB4 toppositeHE984548
2317_DJ1T23173667 (E2)2300455782q36.3+u PID1 toppositeHE984522
5234_DJ1T52342574 (E1)493194113p21.31r (AluSc) USP4 tsameHE984565
2548_DJ1T25482868 (E2)604869663p14.2u FHIT tsameHE984528
2548_DJ2T25481921 (E1)605352003p14.2+u FHIT toppositeHE984529
0018_DJ1T183048 (E2)1693069563q26.2+u MECOM toppositeHE984501
3256_DJ1T32561642 (E1)1820884763q26.33+u ATP11B dsameHE984539
4426_DJ1 c) T44262080 (E1)1900566823q28u CLDN1 dsameHE984552
3966_DJ1 d) T39662990 (E2)745400494q13.3+u IL8 dsameHE984547
3719_DJ2T37192615 (E1)788483464q21.1u MRPL1 toppositeHE984546
4024_DJ2 c) T40241841 (E1)1004502004q23u C4orf17 toppositeHE984550
2231_DJ1 e) T22311030 (E1)1910404004q35.2r (MER31_I) DUX4L2 doppositeHE984521
2592_DJ1T25921948 (E1)45834085p15.32u no gene HE984534
5189_DJ1T51893286 (E2)255747395p14.1+u no gene HE984562
3576_DJ1T35762350 (E1)441847626p21.1+u SLC29A1 dsameHE984544
CS_DJ1 f) CaSki3729 (E2)456591256p12.3u RUNX2 doppositeHE984566
4977_DJ1T49772143 (E1)1267824916q22.32r (LTR) CENPW doppositeHE984559
4977_DJ2T49773059 (E2)288350057p15.1u CREB5 toppositeHE984560
5066_DJ2T50661813 (E1)619693637q11.21r (ALR1) no gene HF559481
5189_DJ2T51892627 (E1)1112723548q23.2+r (L1) no gene HE984563
1509_DJ1T15092686 (E1)1134825188q23.3+u CSMD3 toppositeHE984513
5189_DJ3T51892353 (E1)1283965238q24.21+r (AluY) POU5F1B dsameHE984564
MH186_DJ1MRI–H1861224 (E1)1286758178q24.21+u MYC dsameHE984570
MH186_DJ2MRI–H1862754 (E1)1287466038q24.21+u MYC dsameHE984571
4601_DJ1T4601999 (E1)138580439p23+u LINC00583 dsameHE984553
0186_DJ1T186e2590 (E1)269479289p21.2u IFT74 toppositeHE984503
1509_DJ2T15091228 (E1)339807989p13.3+r (AluJb) UBAP2 toppositeHE984514
4793_DJ3 g) T47931226 (E1)1262916359q33.3+u DENND1A toppositeHE999548
CS_DJ2CaSki1973 (E1)1174245010p14u USP6NL dsameHE984567
3256_DJ2T32562344 (E1)10005884010q24.2u LOXL4 dsameHE984540
3256_DJ3T32562612 (E1)10005886510q24.2+u PYROXD2 doppositeHE984541
MH196_DJ1MRI–H1963858 (E5)4796786111p11.2+u PTPRJ dsameHE984572
3427_DJ1T34271794 (E1)1223342212p13.2r (AluSg) BCL2L14 toppositeHE984542
0841_DJ2T8411953 (E1)7367725313q22.1u KLF5 doppositeHE984506
4046_DJ1T40462146 (E1)7399947613q22.1u KLF5 doppositeHE984551
SH_DJ1SiHa3133 (E2)7408756213q22.1u KLF5 doppositeHE984573
0841_DJ3T8413281 (E2)7422623113q22.2+u KLF12 doppositeHE984507
2209_DJ1T22092637 (E1)7425597913q22.2+u KLF12 doppositeHE984520
1907_DJ1T1907U2412 (E1)6213504214q23.2+u HIF1A dsameHE984517
0182_DJ1T182e2725 (E1)5883625715q22.2+u LIPC tsameHE984502
2317_DJ2T23171028 (E1)9072952715q26.1u SEMA4B toppositeHE984523
2085_DJ2T20852198 (E1)1191811816p13.13u BCAR4 tsameHE984519
3427_DJ2T34272461 (E1)1965731817p11.2+u ULK2 doppositeHE984543
2707_DJ1T27073564 (E2)3781806117q21.31+u STARD3 tsameHE984535
2707_DJ2T27072970 (E2)3786230717q21.31u ERBB2 toppositeHE984536
1686_DJ1T16861354 (E1)5790886617q23.2u VMP1 toppositeHE984515
4749_DJ1T47492564 (E1)149333418p11.32+r (L1) no gene HE984554
4749_DJ2T47491027 (E1)150614318p11.32u LINC00470 dsameHE984555
4749_DJ3T47493337 (E2)3424200918q12.2+u FHOD3 tsameHE984556
2548_DJ3T25482019 (E1)96305219p13.3u ARID3A toppositeHE984530
2548_DJ4T25483890 (E5)208038019p13.3u MOB3A tsameHE984531
2967_DJ1T29672132 (E1)1961097219p13.11+u GATAD2A tsameHE984538
0186_DJ2T186e3897 (E5)3049822119q11+u URI1 tsameHE984504
5066_DJ1T50662470 (E1)1115961320p12.2+u no gene HE984561
2548_DJ5T25482406 (E1)1488685620p12.1+r (AluSz) MACROD2 tsameHE984532
2548_DJ6T25483888 (E5)3211664020q11.22+u CBFA2T2 tsameHE984533
2349_DJ1T23492949 (E2)7521372Xp22.31u STS doppositeHE984527
4024_DJ1T40243018 (E2)17517026Xp22.13+u NHS tsameHE984549
0841_DJ4T8413413 (E2)24333382Xp22.11u FAM48B2 dsameHE984508
0841_DJ5T8411703 (E1)24564835Xp22.11u PDK3 doppositeHE984509
0940_DJ1T9401529 (E1)124850317Xq25+u DCAF12L2 doppositeHE984512
CS_DJ3CaSki975 (E1)144778296Xq27.3+u SLITRK2 dsameHE984568
CS_DJ4CaSki2987 (E2)144789749Xq27.3+u SLITRK2 dsameHE984569
4793_DJ1 h) T47931932 (E1)r (GGAAT)HE984557
4793_DJ2 h) T47933881 (E5)r (GGAAT)HE984558

The viral-cellular DNA junctions (DJ) are sorted by chromosomal map position of the cellular sequences (fifth column).

u/r: u = unique cellular sequence; r = repetitive cellular sequence; the names of repeat sequences are given in parentheses.

t/d: t = gene directly targeted by HPV16 integration; d = the first gene located within 500 kb downstream of the DNA junction.

Orientation of the cellular gene with regard to the early region of integrated HPV16 DNA.

DNA junctions 2319_DJ1, DJ2 and DJ3 were discovered in the TEN16 sequence library by searching for DNA junctions located upstream of the identified RNA junction 2319_RJ (Table 2 and Figure 3). DJ2 contains a 65-bp sequence of chromosome 15 between HPV16 and chromosome 1 sequences. DJ1 and DJ3 were below the cutoff of 15 read pairs. DJ1 has also been identified by DIPS-PCR [44].

DNA junction 0892_DJ2 contains repetitive cellular sequence (AluSx), but could be assigned to chromosome 2 by taking into account the other DNA junction (0892_DJ1) and the RNA junction (0892_RJ; Table 2 and Figure 3).

DNA junctions 4426_DJ1 and 4024_DJ2 with <15 read pairs each were discovered by searching in the TEN16 sequence library for DNA junctions located upstream of the identified RNA junctions (Table 2).

DNA junction 3966_DJ1 was identified in the TEN16 sequence library after the second-round sequence-sorting.

DNA junction 2231_DJ1 was first mapped to two chromosome regions 4p16.3 and 4q35.2 with long identical sequences downstream of the junction. The mapping was then refined by long-range PCR to be on 4q35.2.

DNA junction CS_DJ1 with <15 read pairs was discovered by searching in the TEN16 sequence library for the known junction sequence [54].

DNA junction 4793_DJ3 with <15 read pairs was discovered by searching in the TEN16 sequence library for additional DNA junctions in the respective barcode.

Chromosome mapping of 4793_DJ1 and DJ2 was impossible because the flanking cellular sequences are composed mainly of the simple repeat sequence GGAAT.

The viral-cellular DNA junctions (DJ) are sorted by chromosomal map position of the cellular sequences (fifth column). u/r: u = unique cellular sequence; r = repetitive cellular sequence; the names of repeat sequences are given in parentheses. t/d: t = gene directly targeted by HPV16 integration; d = the first gene located within 500 kb downstream of the DNA junction. Orientation of the cellular gene with regard to the early region of integrated HPV16 DNA. DNA junctions 2319_DJ1, DJ2 and DJ3 were discovered in the TEN16 sequence library by searching for DNA junctions located upstream of the identified RNA junction 2319_RJ (Table 2 and Figure 3). DJ2 contains a 65-bp sequence of chromosome 15 between HPV16 and chromosome 1 sequences. DJ1 and DJ3 were below the cutoff of 15 read pairs. DJ1 has also been identified by DIPS-PCR [44].
Table 2

Comparison of HPV16-cellular DNA junctions and RNA junctions.

TA-DNA junction (TEN16)RNA junction (APOT)& DistanceDistance
groupTumorIDHPV16/cellularIDHPV16/cellularChr.DJ/RJ§ (bp)Do/Ac$ (bp)
1 (1) T180018_DJ13048/1693069560018_RJ0880/169309182322264394
(n = 12)T182e0182_DJ12725/588362570182_RJ0880/588379411516843529a)
T18751875_DJ13010/2018237201875_RJ0880/2018237672472177
T28822882_DJ13596/1465120492882_RJ3596/14651204920n.a.
T29672967_DJ12132/196109722967_RJ0880/19611943199712223a)
T35763576_DJ12350/441847623576_RJ0880/4418518764251895
T39663966_DJ12990/745400493966_RJ0880/74542601425524662
T39873987_DJ13489/2122514503987_RJ3489/21225145020n.a.
T40464046_DJ12146/739994764046_RJ0880/73987773131170312969
T44264426_DJ1# 2080/1900566824426_RJ0880/19004440931227313473
T46014601_DJ10999/138580434601_RJ0880/13860211921682287
T52345234_DJ12574/493194115234_RJ0880/49318280311312825a)
2 (2) T8920892_DJ12970/27869352no corresponding RJ2n.a.n.a.
(n = 8)0892_DJ22886/278691530892_RJ0880/27865759233945400
T15091509_DJ12686/113482518no corresponding RJ8n.a.n.a.
1509_DJ21228/339807981509_RJ0880/33981987911891537
T23172317_DJ13667/230045578no corresponding RJ2n.a.n.a.
2317_DJ21028/907295272317_RJ1028/90729527150n.a.
T23192319_DJ1# 4338/10707431no corresponding RJ1n.a.n.a.
2319_DJ2# 3115/106992822319_RJ0880/10697465118174052
2319_DJ3# 0910/106975452319_RJ0880/10697465180110
T32563256_DJ11642/182088476no corresponding RJ3n.a.n.a.
3256_DJ22344/100058840no corresponding RJ10n.a.n.a.
3256_DJ32612/1000588653256_RJ0880/100059016101511883
T34273427_DJ11794/12233422no corresponding RJ12n.a.n.a.
3427_DJ22461/196573183427_RJ0880/196586641713462927
T40244024_DJ13018/17517026no corresponding RJXn.a.n.a.
4024_DJ2# 1841/1004502004024_RJ0880/10043750141269913660
T51895189_DJ13286/25574739no corresponding RJ5n.a.n.a.
5189_DJ22627/111272354no corresponding RJ8n.a.n.a.
5189_DJ32353/1283965235189_RJ0880/128404749882269699
3 (3) T186e* 0186_DJ12590/26947928no corresponding RJ9n.a.n.a.
(n = 2)0186_DJ23897/30498221no corresponding RJ19n.a.n.a.
no corresponding DJ0186_RJ0880/853947949n.a.n.a.
T50665066_DJ12470/11159613no corresponding RJ20n.a.n.a.
5066_DJ21813/61969363no corresponding RJ7n.a.n.a.
no corresponding DJ5066_RJ0880/510867612n.a.n.a.

TA = TEN16/APOT comparison; DJ = DNA junction; RJ = RNA junction; Chr. = chromosome; n.a. = not applicable; Do = splice donor; Ac = splice acceptor.

TA-group 1: samples with one DJ and a corresponding RJ.

TA-group 2: samples with one corresponding DJ/RJ pair and additional DJs without RJ counterpart.

TA-group 3: samples without corresponding DJ/RJ.

The 22 RNA junctions (APOT) are part of a previous study [50] in which information on chromosomal locations, cellular genes and splicing is given, but without the exact position numbers of the cellular breakpoints.

Genomic distance between the cellular breakpoints of DJ and RJ.

Distance from the HPV16 splice donor (position 880) to the cellular splice acceptor (see Figure 1).

Discovered by searching in the TEN16 sequence library for DJs located within 1 Mb upstream of the respective RJs.

For sample T186e, the cellular sequences of DJ1 and RJ were both mapped to chromosome 9, but in opposite orientation to each other.

In the fusion transcript, the viral E6/E7 exon is spliced to the next downstream exon of the cellular gene (see Figure 3).

Figure 3

Examples of intragenic HPV16 DNA integration sites.

(A) Tumor T182e has one HPV16 integration site 0182_DJ1, which is transcriptionally active (TA-group 1). The integrated HPV16 DNA is located in the intron (i) between exons (e) 5 and 6 of the cellular gene LIPC (transcript 003, Ensembl ID: ENST00000414170, plus strand, 10 exons), and has the same orientation as LIPC. APOT analysis identified an HPV16-cellular fusion transcript (0182_RJ) in which the viral exon is spliced to the downstream LIPC exon 6. (B) Tumor T892 (TA-group 2) has two HPV16 integration sites (0892_DJ1 and 0892_DJ2), which are both located in the intron between exons 12 and 13 of the cellular gene GPN1 (transcript 001, Ensembl ID: ENST00000264718, plus strand, 14 exons). While 0892_DJ1 has the same orientation as GPN1, the transcriptionally active 0892_DJ2 has the opposite orientation. In the transcript 0892_RJ the viral exon is spliced to an alternative cellular exon. (C) In tumor T2319 (TA-group 2), all three identified HPV16 integration sites are located within the cellular gene CASZ1 (transcript 003, Ensembl ID: ENST00000377022, 21 exons). Since the CASZ1 gene is located on the minus strand, the sense orientation of the gene is from right to left. Junction 2319_DJ1 is located in an intron in opposite direction to CASZ1. Junctions 2319_DJ2 and 2319_DJ3 are located in the terminal exon 21 in the same direction as the CASZ1 gene, DJ2 in the terminal part of the translated region and DJ3 in the 3′ untranslated region (3′-UTR). Both are possible templates for the HPV16-cellular fusion transcript 2319_RJ. – In the DJs and RJs, the open boxes denote the HPV16 part and the black boxes/arrows the fused cellular part. The arrow of the DJs indicates the sense orientation of the HPV16 oncogenes. Transcribed DJs and the RJs are shown in red, non-transcribed DJs in blue letters.

DNA junction 0892_DJ2 contains repetitive cellular sequence (AluSx), but could be assigned to chromosome 2 by taking into account the other DNA junction (0892_DJ1) and the RNA junction (0892_RJ; Table 2 and Figure 3). DNA junctions 4426_DJ1 and 4024_DJ2 with <15 read pairs each were discovered by searching in the TEN16 sequence library for DNA junctions located upstream of the identified RNA junctions (Table 2). DNA junction 3966_DJ1 was identified in the TEN16 sequence library after the second-round sequence-sorting. DNA junction 2231_DJ1 was first mapped to two chromosome regions 4p16.3 and 4q35.2 with long identical sequences downstream of the junction. The mapping was then refined by long-range PCR to be on 4q35.2. DNA junction CS_DJ1 with <15 read pairs was discovered by searching in the TEN16 sequence library for the known junction sequence [54]. DNA junction 4793_DJ3 with <15 read pairs was discovered by searching in the TEN16 sequence library for additional DNA junctions in the respective barcode. Chromosome mapping of 4793_DJ1 and DJ2 was impossible because the flanking cellular sequences are composed mainly of the simple repeat sequence GGAAT.

Comparison between HPV16-cellular DNA and RNA Junctions

APOT assay was performed for 46 of the 47 carcinomas analyzed by TEN16, the one exception being T1907U for which no RNA was available. The APOT RT-PCR technique can discriminate the episome-derived HPV oncogene transcript with a constant size of about 1 kb from integrate-derived HPV-cellular fusion transcripts with variable sizes (off-size transcripts) [21], as depicted in Figure 1B and 1C. From the 46 carcinomas, 22 HPV16-cellular fusion transcripts were determined in 22 samples giving one RNA junction per sample. Their junction features are summarized in Table 2 (for additional information see [50]). TA = TEN16/APOT comparison; DJ = DNA junction; RJ = RNA junction; Chr. = chromosome; n.a. = not applicable; Do = splice donor; Ac = splice acceptor. TA-group 1: samples with one DJ and a corresponding RJ. TA-group 2: samples with one corresponding DJ/RJ pair and additional DJs without RJ counterpart. TA-group 3: samples without corresponding DJ/RJ. The 22 RNA junctions (APOT) are part of a previous study [50] in which information on chromosomal locations, cellular genes and splicing is given, but without the exact position numbers of the cellular breakpoints. Genomic distance between the cellular breakpoints of DJ and RJ. Distance from the HPV16 splice donor (position 880) to the cellular splice acceptor (see Figure 1). Discovered by searching in the TEN16 sequence library for DJs located within 1 Mb upstream of the respective RJs. For sample T186e, the cellular sequences of DJ1 and RJ were both mapped to chromosome 9, but in opposite orientation to each other. In the fusion transcript, the viral E6/E7 exon is spliced to the next downstream exon of the cellular gene (see Figure 3). Comparison with the RNA junction sequences was performed to see which DNA junctions represent transcriptionally active HPV16 integrates. TEN16 DNA junctions corresponding to APOT RNA junctions were observed first in 16 samples. For the other six RNA junctions, the TEN16 sequence library was inspected again by searching for DNA junction sequences located within 1 Mb upstream of the RNA junction. Five DNA junctions from three samples could be identified (2319_DJs 1–3, 4024_DJ2 and 4426_DJ1), and all but 2319_DJ2 fell below the cutoff of 15 read pairs. For sample T3966, the DNA junction corresponding to the RNA junction was detected only after a second sequence-sorting under less stringent conditions (one mismatch per eight nucleotides allowed in the Nextera adapter). No corresponding DNA junctions, however, could be detected for the RNA junctions of tumors T186e and T5066. Based on these results, the tumor samples with DNA/RNA junctions were classified into three TEN16/APOT (TA-) groups (see Table 2). TA-group 1 includes the samples (n = 12) specified by one DNA junction and a corresponding RNA junction (as an example for this group, the genomic integration site and fusion transcript structure of tumor T182e are shown in Figure 3A). Samples in TA-group 2 (n = 8) are featured by one DJ/RJ pair and additional DNA junctions (tumor T892 is shown as an example in Figure 3B). The sample T2319 is a unique case, in which two possible DNA counterparts (DJ2 and DJ3) for the same RNA junction were identified (Figure 3C). The two samples with unmatched DNA/RNA junctions constitute TA-group 3. Comparison between cellular DNA breakpoints and RNA exon boundaries revealed that in most cases (n = 17) the RNA junctions were created by splicing events, in which the viral donor (pos. 880) is fused to a cellular acceptor (depicted in Figure 1C, upper part). In three samples the DNA/RNA junctions are colinear (T2882, T3987 and T2317; see Table 2). T2882 and T3987 are cases in which the HPV16 breakpoints are located downstream of the splice acceptor at position 3358, and hence the viral-cellular junctions are maintained in the fusion transcripts (see Figure 1C, lower part). In 2317_DJ2 the HPV16 breakpoint is located at pos. 1028 indicating that the splice donor 880 was not used.

Examples of intragenic HPV16 DNA integration sites.

(A) Tumor T182e has one HPV16 integration site 0182_DJ1, which is transcriptionally active (TA-group 1). The integrated HPV16 DNA is located in the intron (i) between exons (e) 5 and 6 of the cellular gene LIPC (transcript 003, Ensembl ID: ENST00000414170, plus strand, 10 exons), and has the same orientation as LIPC. APOT analysis identified an HPV16-cellular fusion transcript (0182_RJ) in which the viral exon is spliced to the downstream LIPC exon 6. (B) Tumor T892 (TA-group 2) has two HPV16 integration sites (0892_DJ1 and 0892_DJ2), which are both located in the intron between exons 12 and 13 of the cellular gene GPN1 (transcript 001, Ensembl ID: ENST00000264718, plus strand, 14 exons). While 0892_DJ1 has the same orientation as GPN1, the transcriptionally active 0892_DJ2 has the opposite orientation. In the transcript 0892_RJ the viral exon is spliced to an alternative cellular exon. (C) In tumor T2319 (TA-group 2), all three identified HPV16 integration sites are located within the cellular gene CASZ1 (transcript 003, Ensembl ID: ENST00000377022, 21 exons). Since the CASZ1 gene is located on the minus strand, the sense orientation of the gene is from right to left. Junction 2319_DJ1 is located in an intron in opposite direction to CASZ1. Junctions 2319_DJ2 and 2319_DJ3 are located in the terminal exon 21 in the same direction as the CASZ1 gene, DJ2 in the terminal part of the translated region and DJ3 in the 3′ untranslated region (3′-UTR). Both are possible templates for the HPV16-cellular fusion transcript 2319_RJ. – In the DJs and RJs, the open boxes denote the HPV16 part and the black boxes/arrows the fused cellular part. The arrow of the DJs indicates the sense orientation of the HPV16 oncogenes. Transcribed DJs and the RJs are shown in red, non-transcribed DJs in blue letters. In 24 of the 46 carcinomas, the APOT analysis showed only the constant-size 1-kb HPV16 transcript assumed to be episome-derived (see Figure 1B). Nevertheless, 31 authentic DNA junctions of integrated HPV16 could be determined by TEN16 in 14 of the 24 samples. These samples were categorized as TA-group 4. This group includes six samples with one HPV16 integration site and eight samples with multiple integration sites ranging from two to six (Table 3). In ten other samples with constant-size HPV16 mRNA, no viral integration site could be identified by TEN16 (TA-group 5). Most likely, these cervical carcinomas contain truly episomal HPV16 DNA.
Table 3

Tumors with integrated HPV16 DNA, but without detected APOT fusion transcripts (TA-group 4).

Number of HPV 16 integrationsites (DJs) per tumorNumber of tumors (n = 14)Tumor ID
16T940, T1686, T2209,
T2231, T2349, T2592
24T2085, T2707, T3719,
T4977
32T4749, T4793
51T841
61T2548
The HPV16 3′-breakpoints in the viral-cellular DNA junctions were analyzed for their distribution within different segments of the early region. The segment named E1-PAE extending from the E1 start at pos. 865 to the poly(A)-signal PAE at pos. 4215 (3351 bp) was taken as reference region. Seventy-four 3′-breakpoints are located within the E1-PAE region, and the only outlier in the L2 gene. The relative frequency of HPV16 3′-breakpoints within each segment was compared to the relative length of the segment. Statistical significance was calculated using the exact two-tailed one-sample binomial test. Comparison of the individual genes E1, E2 and E5 to the complete E1-PAE region revealed a frequency distribution of the 3′-breakpoints that is approximately proportional to the relative gene length (p>0.05 for difference). The E1-PAE region was then split into the two segments E1-Ac (2493 bp) and Ac-PAE (858 bp) located upstream and downstream, respectively, of the splice acceptor at position 3358. Here, a statistically significant (p<0.05) preferential distribution of the 3′-breakpoints in the E1-Ac segment became apparent, both when analyzing all 74 and the subset of 21 transcribed DNA junctions. The data are shown in Figure 4 and Table S4.
Figure 4

Frequency distribution of HPV16 3′-breakpoints in different segments of the HPV16 early region.

The distribution of HPV16 3′-breakpoints of viral-cellular DNA junctions (n = 74) was analyzed within the five different segments E1, E2, E5, E1-Ac and Ac-PAE of the HPV16 early region. The positions of each segment in the HPV16 genome are given in parentheses. The E1-PAE segment of the HPV16 early region (pos. 865–4215, 3351 bp) was taken as reference. The relative length of each segment is shown by the white bars. The relative frequency of HPV16 3′-breakpoints within each segment is shown by the grey bars for all DNA junctions (DJ_all, n = 74, middle-grey bar), the transcribed DNA junctions (DJ_tr., n = 21, light-grey bar) and the non-transcribed DNA junctions (DJ_n.tr., n = 53, dark-grey bar). The exact two-tailed one-sample binomial test was used for statistical analysis by comparing the relative frequency of HPV16 3′-breakpoints in each segment to the relative segment length. Bars marked with asterisks indicate statistically significant results (P<0.05). Data are given in Table S4.

Frequency distribution of HPV16 3′-breakpoints in different segments of the HPV16 early region.

The distribution of HPV16 3′-breakpoints of viral-cellular DNA junctions (n = 74) was analyzed within the five different segments E1, E2, E5, E1-Ac and Ac-PAE of the HPV16 early region. The positions of each segment in the HPV16 genome are given in parentheses. The E1-PAE segment of the HPV16 early region (pos. 865–4215, 3351 bp) was taken as reference. The relative length of each segment is shown by the white bars. The relative frequency of HPV16 3′-breakpoints within each segment is shown by the grey bars for all DNA junctions (DJ_all, n = 74, middle-grey bar), the transcribed DNA junctions (DJ_tr., n = 21, light-grey bar) and the non-transcribed DNA junctions (DJ_n.tr., n = 53, dark-grey bar). The exact two-tailed one-sample binomial test was used for statistical analysis by comparing the relative frequency of HPV16 3′-breakpoints in each segment to the relative segment length. Bars marked with asterisks indicate statistically significant results (P<0.05). Data are given in Table S4.

Genomic Context of Integrated HPV16 DNA

For 73 of the 75 authentic DNA junctions, unequivocal mapping of the cellular sequence part to a specific chromosomal locus was possible (see Table 1). HPV16 DNA integration into a unique cellular sequence was observed in 60 cases. For the other 13 junctions, despite their repetitive nature the cellular parts could be assigned to specific chromosome regions based on sequence similarity. One exception was the repetitive cellular part of 0892_DJ2, for which the precise chromosome localization could only be unraveled by taking into account the sequence information of 0892_DJ1 and 0892_RJ (see Figure 3B). In this case, the localization was assured by additional junction-PCR and Sanger sequencing. The cellular sequence of 2231_DJ1 was first mapped to two potential locations 4p16.3 and 4q35.2, which are 99% identical to each other in a length of 6.7 kb immediately downstream of the junction. Long-range PCR with specific primers for each of the two regions identified 4q35.2 as the true integration site. For 4793_DJ1 and DJ2, mapping to a specific chromosome region was impossible because the cellular parts are composed mainly of GGAAT simple repeats. For 36 DNA junctions in 22 samples, cellular genes were directly targeted by HPV16 DNA integration (see Table 1, column t/d). Three examples of intragenic HPV16 integration sites are shown in Figure 3. The targeted genes are in the same orientation as the integrated HPV16 E6/E7 in 16 cases, and in the opposite orientation in 20 cases (Table S5). The cellular DNA breakpoints are located in introns in 34 cases (see T182e and T892 in Figure 3A and 3B). The two exceptions with exon location are 2319_DJ2 and DJ3 (Figure 3C). The same orientation can lead to fusion mRNA in which the HPV16 E6/E7 exon is spliced to a genuine exon of the cellular gene. This was indeed the case for the three samples with known RNA junction sequences (T182e, T2967 and T5234), in which the HPV16 exon was spliced to the next downstream cellular exon (Figure 3A). For tumor T2319 two novel DNA junctions were determined by TEN16, in addition to a DNA junction and the one RNA junction reported earlier [44]. The novel junctions DJ2 and DJ3 as well as the RNA junction are all located in sense orientation in the long 3′-terminal exon of the CASZ1 gene, whereas the already known DNA junction DJ1 is located in antisense orientation in an intron (Figure 3C). Junction 2319_DJ2 has the peculiar structure that an intervening sequence of 65 bp mapping to chromosome 15 is located between HPV16 and the CASZ1 sequence on chromosome 1. The tumor T892 is another noteworthy example. The two DNA junctions are both located in an intron of gene GPN1, but in opposite orientation to each other. The RNA junction (0892_RJ) corresponds to the DNA junction (0892_DJ2) with antisense orientation to GPN1 (Figure 3B). For the remaining 37 DNA junctions (of 26 samples) without directly targeted cellular gene, the genomic regions covering 500 kb downstream of the HPV16 integration sites were inspected. Examples of intergenic HPV16 integration sites are compiled in Figure 5. Cellular genes in the same orientation as HPV16 E6/E7 were found in 16 cases and in the opposite orientation in 13 cases (Table S6). In eight cases no downstream gene was detected in the 500-kb region (Table 1). In five of the 16 cases with identical orientation, RNA junction sequences have been determined. In all the five cases, splicing occurred to cellular sequences located before the downstream gene (T3966 is shown as an example in Figure 5A). Splicing to an exon of the downstream cellular gene was not observed.
Figure 5

Examples of intergenic HPV16 DNA integration sites.

(A) Tumor T3966 has one transcriptionally active HPV16 integration site 3966_DJ1 (TA-group 1), which is located between the cellular genes RASSF6 and IL8, and has the same orientation as the downstream gene IL8. In the fusion transcript 3966_RJ, the viral exon is spliced to an intergenic alternative cellular exon. (B) Tumor T3256 (TA-group 2) has three HPV16 integration sites, one (3256_DJ1) on chromosome 3 (not shown) and two (DJ2 and DJ3) on chromosome 10q24.2 between the cellular genes LOXL4 and PYROXD2. Only 3256_DJ3 is transcribed. The fusion transcript 3256_RJ is in opposite orientation to the downstream gene PYROXD2. (C) In the chromosome region 13q22.1-2, five HPV16 integration sites identified in four independent DNA samples are located in the large intergenic region between the cellular genes KLF5 and KLF12. Only one integrated HPV16 DNA (4046_DJ1) is transcribed.

Examples of intergenic HPV16 DNA integration sites.

(A) Tumor T3966 has one transcriptionally active HPV16 integration site 3966_DJ1 (TA-group 1), which is located between the cellular genes RASSF6 and IL8, and has the same orientation as the downstream gene IL8. In the fusion transcript 3966_RJ, the viral exon is spliced to an intergenic alternative cellular exon. (B) Tumor T3256 (TA-group 2) has three HPV16 integration sites, one (3256_DJ1) on chromosome 3 (not shown) and two (DJ2 and DJ3) on chromosome 10q24.2 between the cellular genes LOXL4 and PYROXD2. Only 3256_DJ3 is transcribed. The fusion transcript 3256_RJ is in opposite orientation to the downstream gene PYROXD2. (C) In the chromosome region 13q22.1-2, five HPV16 integration sites identified in four independent DNA samples are located in the large intergenic region between the cellular genes KLF5 and KLF12. Only one integrated HPV16 DNA (4046_DJ1) is transcribed. Multiple HPV16 integration sites were determined in 20 samples (see Table S3). Concerning the samples with two integration sites (n = 12), the two DNA junctions are located in the same chromosome regions in three cases (T892 shown in Figure 3B, T2707 and MRI-H186) and on different chromosomes in nine cases. In the five samples with three integration sites, only one (T2319) has all the three cellular breakpoints close to each other (Figure 3C). From the four DNA junctions identified in CaSki, two are located on chromosome Xq27.3 at a distance of only 11.5 kb. In sample T841, the five integration sites are distributed over three chromosomes. And in sample T2548, the six integration sites are pairwise located on three different chromosomes (Figure 6). The most recurrent integration locus was a region of about 600 kb on chromosome 13q22.1-2 that harbors five HPV16 integration sites identified in four samples (SiHa, T841, T2209, T4046; see Figure 5C).
Figure 6

Tumor T2548 with six HPV16 DNA integration sites.

The six integration sites are distributed on the chromosomes 3, 19 and 20 (two DJs on each). All six integration sites are intragenic, but none of them is transcriptionally active. The cellular genes directly targeted by HPV16 DNA integration in T2548 include FHIT (transcript 002, Ensembl ID: ENST00000468189, minus strand, 9 exons), ARID3A (transcript 001, Ensembl ID: ENST00000263620, plus strand, 9 exons), MOB3A (transcript 001, Ensembl ID: ENST00000357066, minus strand, 5 exons), MACROD2 (transcript 010, Ensembl ID: ENST00000217246, plus strand, 17 exons) and CBFA2T2 (transcript 003, Ensembl ID: ENST00000375279, plus strand, 12 exons).

Tumor T2548 with six HPV16 DNA integration sites.

The six integration sites are distributed on the chromosomes 3, 19 and 20 (two DJs on each). All six integration sites are intragenic, but none of them is transcriptionally active. The cellular genes directly targeted by HPV16 DNA integration in T2548 include FHIT (transcript 002, Ensembl ID: ENST00000468189, minus strand, 9 exons), ARID3A (transcript 001, Ensembl ID: ENST00000263620, plus strand, 9 exons), MOB3A (transcript 001, Ensembl ID: ENST00000357066, minus strand, 5 exons), MACROD2 (transcript 010, Ensembl ID: ENST00000217246, plus strand, 17 exons) and CBFA2T2 (transcript 003, Ensembl ID: ENST00000375279, plus strand, 12 exons).

HPV16 Variant Identification by E6 Sequencing

The HPV16 integration site analysis was combined with determination of the HPV16 variants present in the carcinoma samples. HPV16 variants can be classified based on nucleotide sequences of different genes [8], [55]–[57]. In this study the E6 gene was analyzed. An E6 segment of altogether 482 bp was amplified by two PCR reactions (pos. 111–383, pos.111–592) and tagged with sample-specific 5-nt barcodes. All E6-PCR products (47×2 = 94) were pooled and added to the TEN16 multiplex-PCR products for HiSeq2000 sequencing. After sorting by barcodes, the E6 sequences of all samples were compared to the published E6 sequences of different variants [8], [56], [57]. Among the 47 carcinomas, 23 harbor the European prototype (E-p; T350) and 22 samples the European T350G variant (E-T350G). Variants North-African 1 (NA1) and Asian-American (AA) were identified in one sample each. The results are summarized in Table 4. Within the European lineage, the E-T350G variant (E-L83V on the amino acid level) has been reported to be more prevalent than the E-p prototype (E-L83) in invasive cervical carcinoma and in cervical disease progression, depending on the analyzed population [58], [59]. In the present study, the distribution of tumors containing either integrated or episomal HPV16 DNA was compared between E-p and E-T350G. In the E-p group (n = 23), 20 cases contained integrated HPV16 DNA and 3 cases episomal HPV16 DNA (i.e. no viral-cellular DNA junctions have been identified by TEN16 analysis; see Table 4). The distribution in the E-T350G group (n = 22) was different because 6 cases contained episomal HPV16 DNA and 16 cases contained integrated HPV16 DNA. These differences, however, were not statistically significant (two-tailed P value = 0.28; Fisher’s exact test). The results indicate that the E-p and E-T350G variants of HPV16 are similar in their propensity to integrate.
Table 4

HPV16 E6 variants.

Lineage/sublineageE6 nucleotide positionsTumors*
131132137143145173178256286289335345350403532534
HPV16REFAGTCGCTCTACATAAG
E-pT18, T186e, T892, T1686, T2317, T2548, T2967, T3042, T3315, T3427, T3576, T3966, T4024, T4046, T4426, T4601, T4793, T4977, T5189, T5446
GT2231
TT5066
CT2592
E-T350GGT182e, T707, T739, T940, T966, T1509, T1875, T1907U, T2085, T2209, T2319, T2620, T2882, T3256, T3719, T3799, T4749, T4755, T5234
GGT2707
TGT2349
A/GGT841
NA1TAGTG T1520
AATAGTGGT3987
AsGnone
Af1A/GG/CGTAGTGnone
Af2TGTAGTGnone

E-p = European prototype; E-T350G = European T>G at position 350; NA1 = North-American type 1; AA = Asian-American; As = Asian; Af1 = African type 1; Af2 = African type 2.

Underlined are those tumors which seem to contain episomal HPV16 DNA, because no DNA junctions of integrated HPV16 DNA could be identified by the TEN16 analysis.

E-p = European prototype; E-T350G = European T>G at position 350; NA1 = North-American type 1; AA = Asian-American; As = Asian; Af1 = African type 1; Af2 = African type 2. Underlined are those tumors which seem to contain episomal HPV16 DNA, because no DNA junctions of integrated HPV16 DNA could be identified by the TEN16 analysis.

Discussion

In this study we present the development of a novel multiplex strategy, TEN16, for analysis of HPV16 integration sites. The TEN16 procedure is a special form of targeted sequencing and exploits the high capacity of next-generation DNA sequencing. With this strategy, it was possible to identify 75 HPV16 integration sites (3′ DNA junctions) in a pooled analysis of DNA samples from 47 cervical carcinomas and 4 cell lines. High-quality nucleotide sequences of the DNA junctions were obtained. Sample pooling was conducted twice during the whole TEN16 procedure. The first optional pooling reduced the number of Nextera transposition reactions from 50 to 10, and the second obligatory pooling combined all multiplex HPV16 PCR products into one mixture for Illumina HiSeq2000 paired-end sequencing. After data processing, a cutoff value of 15 read pairs per junction was introduced to pre-select the most promising junction candidates out of about 200,000 viral-cellular junction sequences for validation by junction-PCR. This measure proved to be very useful because it covered most validated DNA junctions (see Figure S1). For seven junctions with less than 15 read pairs each, identification was possible due to additional information. The sensitivity of TEN16 can be estimated from the results obtained for the cervical carcinoma cell lines SiHa and CaSki. SiHa cells harbor a diploid set of one integrated HPV16 DNA copy, a situation similar to single-copy cellular genes [26], [27], [60]. SiHa DNA was mixed with four other carcinomas in the first pooling thus made up 1.8% of the final DNA mixture. The known 3′ junction (SH_DJ1) was present with 16 read pairs in the junction sequence library. CaSki cells contain an extremely high copy number (∼600) of integrated HPV16 full-length DNA arranged in head-to-tail concatemers and flanked by truncated terminal copies [26], [53]. This situation is similar to the presence of a small number of integrated viral genomes coexisting with a large number of episomes. Clusters of integrated HPV16 DNA with different copy numbers have been detected by in situ hybridization at 11 to 16 chromosomal sites in CaSki [27], [61], but there is only one transcriptionally active HPV16 integrate present at single-to-low copy number [27]. CaSki DNA was not subjected to the first pooling thus made up 9.1% (1/11th) of the final DNA mixture. Nonetheless, the known CaSki DNA junction, which is identical to the RNA junction [54], was present in the TEN16 library with only three read pairs (CS_DJ1) and would have escaped detection without the previous sequence information. Three novel CaSki DNA junctions were identified in our study. If CaSki DNA had been subjected to the first pooling, two of them (with read pair numbers of 28 and 42, respectively) might also have escaped detection. The three novel DNA junctions probably originate from non-transcribed concatemeric HPV16 integrates. The overall outcome of the TEN16 study demonstrates that the concomitant analysis of HPV16 integration sites in a mixture of about 50 tumor samples is feasible. The high number of 75 validated HPV16 integration sites demonstrates the effective performance of this strategy. Nevertheless, for a sensitive detection of low-copy HPV16 integration sites, in particular against a high background of full-length HPV16 DNA, it seems appropriate for future TEN16 experiments to reduce the total number of DNA samples and, regarding the first pooling, either to waive it or to reduce the number of pooled samples. Concerning the cervical carcinomas, 67 DNA junctions could be assigned to 37 of 47 samples (79%), while for ten samples no junction could be identified (21%). These percentages are in good agreement with the estimated ∼80% of HPV16-positive cervical carcinomas harboring integrated viral DNA [23]. Furthermore, 18 of the 37 carcinomas (49%) were found to contain more than one DNA junctions per sample. This is a high percentage not seen in previous studies, which reported multiple HPV16 integration sites in 15–20% of analyzed carcinomas [37], [45], [62]. These results might indicate the superior performance of TEN16 compared to the RS-PCR and DIPS-PCR methods for HPV16 integration site analysis. Comparison of the DNA junctions with the 22 RNA junctions determined by APOT analysis allowed to identify the transcriptionally active HPV16 integration sites in 20 carcinomas (see Table 2). Two RNA junctions remained without corresponding DNA junctions for unknown reason. Twelve of the 20 carcinomas (60%) were found to contain a single transcriptionally active HPV16 integrate. The other 8 tumors (40%) are featured by a transcriptionally active HPV16 integrate together with one or two probably silent HPV16 integrates. Strikingly, 31 DNA junctions could be identified in a subset of 14 tumor samples, for which the APOT analysis did not reveal any viral-cellular fusion transcript, but the purely viral oncogene mRNA (depicted in Figure 1B; see Table 3). These carcinomas would have been classified as containing episomal HPV16 DNA, if only the APOT data alone were taken into consideration. The validated DNA junctions contradict this interpretation. Our findings differ from an earlier study, in which the comparison of DIPS and APOT data led to the conclusion that most HPV-cellular junctions are actively transcribed [45]. Further transcriptional analysis in the 14 tumor samples based on the TEN16 data will clarify whether the 31 HPV16 integrates are indeed transcriptionally silent or transcribed into fusion transcripts not detected by APOT. The HPV16 3′-breakpoints in the viral-cellular DNA junctions showed a statistically significant (p<0.05) preferential distribution within the early region segment that is located upstream of the splice acceptor at position 3358 and covers the complete E1 gene and the 5′-terminal half of the E2 gene (Figure 1, Figure 4 and Table S4). This uneven distribution supports the notion that one important aspect of integration-induced disruption of the HPV early region is to abolish production of the spliced purely viral early mRNAs and to replace them by viral-cellular fusion transcripts for deregulated E6/E7 oncogene expression [31]. It should be noted that 3′-breakpoint locations between the splice acceptor and the PAE signal also lead to viral-cellular fusion transcripts (see Figure 1C). Since the E2 gene is either decoupled or disrupted in all these cases, the results might also reflect the importance of E2 inactivation to release the integrated viral early promoter from E2-mediated repression [23]. Another possibility is that opening of the circular HPV genome for integration might occur with some predisposition within that region for the 3′-breakpoints. These different interpretations are not mutually exclusive. HPV16 integration sites discovered in this study are located on almost all chromosomes, similar to the observations from other reports [28], [30], [37], [38], [50], [63]. Clustering of HPV16 integration sites was not particularly evident in our sample collection. The most recurrent integration locus was 13q22.1-2 with five integration sites of four samples (see Figure 5C). This chromosomal region contains the common fragile site FRA13C and has been identified as hotspot for HPV integration also in previous studies [28], [37]. Another known hotspot for HPV integration is region 8q24.21 in which the proto-oncogene MYC is located [41], [42]. The cell line MRI-H186 contains two DNA junctions and the carcinoma T5189 one DNA junction in this region. Cellular genes are targeted by HPV16 integration in 36 of 73 cases (49%; see Table S5). These results support the previous observation that HPV integration has a preference for transcribed regions [28]. The integrated viral oncogenes and the targeted cellular genes have identical orientation in 16 cases. Fourteen of the 16 DNA junctions are located in introns. Preferential integration into introns, also observed by others [37], [45], [64], is most likely due to the much larger sizes of introns compared to exons. HPV integration into an intron will probably disturb expression of the targeted cellular gene. In a recent study, expression analysis was performed for ten tumors with cellular genes directly targeted by HPV DNA integration in the same orientation [44], including four samples (T182e, T2319, T2967 and T5234) also analyzed by TEN16 in the present study. In two of the ten cases (T2319 and T182e), the normal transcript of the affected cellular genes, CASZ1 and LIPC, respectively, could not be detected. At least for sample T2319, the complete loss of CASZ1 expression could be attributed to HPV16-induced insertional mutagenesis along with absence of the other allele not disrupted by integration [44]. Noteworthy, T2319 is the only example in our collection of carcinoma samples in which two HPV16 DNA junctions are located in an exon, here in the long 3′-terminal exon of the CASZ1 gene (see Figure 3C). Altogether these examples support the assumption that insertional mutagenesis of cellular genes by HPV integration contributes to cervical carcinogenesis. For another subset of HPV16 DNA junctions (n = 29), cellular genes could be found within 500 kb downstream of the HPV16 integration sites either in the same (n = 16) or in opposite (n = 13) orientation (see Table S6). In such cases it is not known whether and to which extent the integrated HPV16 DNA might affect expression of the flanking cellular genes. Since enhancers can activate genes located as far as 2–3 Mb off [65], influence of the HPV16 enhancer elements on neighboring cellular genes is possible. Many cellular genes identified at or downstream of the 75 HPV16 integration sites are cancer genes, such as MYC, ERBB2, FHIT, MECOM (EVI1) and BCAR4. The search for cancer genes, using different approaches including sequencing of whole cancer genomes, has identified until now more than 400 human genes that are mutated in different types of cancer [66] (www.sanger.ac.uk/genetics/CGP/Census/). Applying the TEN16 integration site analysis to large series of additional cervical cancer and precancer samples will lead to a comprehensive mapping of HPV16 integration sites, far beyond the currently available data. This will allow to identify and to catalog precisely the hotspots of HPV16 integration as well as the affected cellular genes and the pathways in which they are involved. A database of viral integration sites related to human disease has recently been implemented [67], which will be helpful for a thorough interpretation of the data. HPV integration is a particular type of cancer mutation able to alter substantially the structure, expression and function of cellular genes. The identification of HPV integration sites in cervical carcinomas and other types of HPV-induced cancer therefore adds a special aspect to the cancer genome projects. The TEN16 procedure, here established for identification of the potentially transcribed 3′ junctions of integrated HPV16 DNA, can be easily expanded to the determination of the 5′ junctions by selecting appropriate primers in the viral L1–L2 region. Since the cellular target sites of HPV integration are often affected by deletions and rearrangements [37], the determination of both the 3′ and 5′ junctions will convey a more complete picture of the integration effects on the viral and cellular genomes. Additional aspects of further development include the integration site analysis of other high-risk HPV types that will only need the selection of type-specific primer sets. The high capacity of NGS can be used further to address additional questions, like sequence variations in the HPV genome. As a prototypic example, the HPV16 E6 variants of the carcinoma samples have been determined concomitantly in the present study. Although recurrent HPV insertion loci at the chromosomal level exist, the precise nucleotide sequences at the viral-cellular junctions are always different. Due to this feature, the viral-cellular chimeric sequences of integrated HPV genomes are unique fingerprints for every tumor. Therefore, the HPV integration sites have the potential to be used as personalized tumor biomarkers in the diagnosis, treatment monitoring and follow-up assessment of cervical carcinomas and high-grade precursor lesions in the individual patients. Molecular biomarkers for early detection of residual and recurrent disease will probably prove beneficial to improve treatment strategies and to increase survival [68]. A recent report demonstrates that viral-cellular junction sequences are specific markers which can be amplified from the circulating tumor DNA in cervical cancer patients [69]. The TEN16 strategy presented in this paper offers a new tool for efficient determination of HPV16 integration sites with precise nucleotide sequences that will help to exploit the potential of these unique molecular markers for prognostic evaluation and treatment of cervical cancer patients.

Materials and Methods

Ethics Statement

All patients provided written informed consent to use their biopsy material for further molecular analyses to be conducted in the Jena University Hospital and in collaboration with academic partners. This study was approved by the ethics committee of the Friedrich-Schiller University Jena (reference numbers 0175-02/00 and 2174-12/07).

Clinical Samples and Cell Lines

All cervical carcinoma biopsies were taken from patients at the Department of Gynecology of Jena University Hospital, Germany, between 1995 and 2008. HPV genotype was determined by performing a multiplex real-time PCR detecting the seven most common high-risk HPV types [70]. Forty-seven HPV16-positive cervical carcinomas were identified for further analysis. The average age of these patients was 47 years within a range of 31 to 77 years. The clinical parameters were: 100% squamous cell carcinoma, 91% stage T1/T2, 60% N0 and 64% grade G1/G2. HPV16-positive human cervical carcinoma cell lines SiHa and CaSki were obtained from the American Tissue Culture Collection (ATCC), MRI-H186 and MRI-H196 from the DKFZ Tumorbank.

DNA Reference Sequences

For the human genome, the latest major release GRCh37 was used as reference assembly (see http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/). For HPV16, the HPV16REF sequence (7906 bp) was used which is based on the originally published sequence data [71] (GenBank accession number K02718; 7904 bp) plus revisions published by the Los Alamos National Laboratory from 1995 to 1997 in a compendium called “Human papillomaviruses: A compilation and analysis of nucleic acid and amino acid sequences”. The information is available in the Papillomavirus Episteme database [72]. HPV16REF differs from the NCBI reference sequence for HPV16 (accession number NC_001526.2; 7905 bp) in several positions. In the early region, HPV16REF has a G at position 1139 that is absent in NC_001526.2 and K02718).

Nucleic Acid Isolation

High-molecular-weight genomic DNA was isolated from cell lines and clinical samples using the phenol-chloroform extraction method. The isolated DNA was roughly quantified by comparison of staining intensities with the λ/HindIII fragments (Invitrogen) in ethidium bromide-stained agarose gel. For RNA isolation, the tumor tissues were homogenized by using 0.55 mm-diameter injection needles. Total RNA was isolated using the NucleoSpin RNA II kit (Macherey-Nagel) with DNase treatment according to the protocol for tissue samples. RNA was quantified with the NanoDrop 1000 spectrophotometer.

Amplification of Papillomavirus Oncogene Transcripts (APOT)

The APOT assay was performed basically as described [44], [50]. The gel-extracted RT-PCR products were sequenced at Seqlab (Göttingen, Germany).

Nextera DNA Fragmentation and Tagging

First, five DNA samples of 10 ng each were pooled. The simultaneous DNA fragmentation and adapter tagging was performed with the Illumina-compatible Nextera DNA Sample Prep kit (Epicentre) according to the manufacturer’s instruction. Briefly, the pooled DNA of total 50 ng was mixed with 4 µl of 5× HMW buffer, 1 µl Nextera enzyme mix and water to a volume of 20 µl. After incubation at 55°C for 5 min, the Nextera fragments were purified with DNA clean & concentrator-5 spin column (Zymo) and eluted in 10 µl water. During column purification, transposase complexes were detached from DNA targets to expose the 9-nt single-stranded gaps accessible for subsequent modification.

Blocking of the DNA 3′-ends

To minimize the whole-genome amplification of Nextera-treated DNA during PCR, the free 3′-OH ends upstream of the 9-nt gaps were blocked by enzymatic incorporation of ddNTP (see Figure 2B). Reaction components including 10 µl of purified Nextera fragments, 2.5 µl of 10× Klenow buffer, 1 µl ddNTP (1.25 mM each) and 0.5 µl Klenow exo- (5 U/µl; Fermentas) were mixed with water to 25 µl total, and incubated at 37°C for 15 min. The blocking reaction was heated at 90°C for 3 min to dissociate the 19-nt pMENTS oligonucleotide from the transposon ends, followed by incubation at 75°C for 5 min to renature the 5′-tagged dsDNA and then chilled at 4°C. The single-stranded pMENTS and impurities were eliminated by spin-column purification with DNA clean & concentrator-5. The purified DNA was eluted in 10 µl water.

HPV16 DNA Enrichment by Multiplex PCR

The enrichment of HPV16-containing DNA was performed with the Multiplex PCR Plus kit (Qiagen) according to the user manual. For every Nextera-treated DNA pool, two multiplex PCR reactions were carried out in parallel. Briefly, 5 µl of purified 3′-blocked Nextera fragments were mixed with 25 µl of 2× Multiplex PCR master mix, 5 µl Q-solution, 2 µl of HPV16 primer mixture HPM-A or -B (5 µM of each primer; see Table S1), 2 µl of barcoded Nextera adapter (10 µM; see Table S2) and water to 50 µl total. Cycling conditions were: initial denaturation/activation at 95°C (5 min), 30 cycles including denaturation at 95°C (30 sec), annealing at 57°C (90 sec) and elongation at 72°C (40 sec). Equal volumes of all multiplex-PCR products were pooled and purified with illustra MicroSpin S-400 HR column (GE Healthcare).

HPV16 E6 PCR

The E6 gene was used to classify HPV16 variants present in the clinical samples. To differentiate among the variants, nucleotide sequences at positions 131, 132, 143, 145, 178, 286, 289, 335, 350 and 532 of HPV16 are required [8], [56]. Two E6 PCR targets were designed to cover all the required positions, using a barcoded forward primer 5′-(5-nt barcode) AGGACCCACAGGAGCGAC-3′ (pos. 111–128) in combination with two reverse primers for each sample (5′-TGTTGTATTGCTGTTCTAATGTTG-3′, pos. 383–360; 5′-ATTCATGCAATGTAGGTGTATCTC-3′, pos. 592–569). PCR was performed with the FastStart High Fidelity kit (Roche). In a 50-µl reaction volume, 50 ng template DNA was mixed with 1×buffer, 2 mM MgCl2, 0.2 mM each dNTP, 0.4 µM forward primer, 0.4 µM reverse primer and 2.5 U enzyme mix. Cycling conditions were: initial denaturation at 95°C (2 min), 35 cycles including denaturation at 94°C (30 sec), annealing at 60°C (30 sec), and elongation at 72°C (30 sec). PCR products were purified with QIAquick PCR purification kit (Qiagen). Equal amount of all reactions were pooled and 10 ng was mixed with the TEN16 PCR products (1.4 µg) for HiSeq2000 sequencing.

Next-generation Sequencing

High-throughput DNA sequencing was performed using the Illumina HiSeq2000 NGS technology. One microgram of the combined “TEN16+E6” PCR products was processed with the TruSeq DNA sample preparation kit (Illumina). The E-Gel SizeSelect system (Invitrogen) was used to collect adapter-ligated DNA fragments with insert sizes of 200–500 bp. The recovered DNA was loaded onto one lane of the HiSeq flow cell, at a concentration of 6.5 pM. A PhiX control v3 library (Illumina) was loaded onto another lane of the same run. Clonal amplification was done in cBot (Illumina) using the TruSeq paired-end v3 cluster generation chemistry (Illumina). For HiSeq2000 sequencing, the 200-cycle TruSeq-v3-SBS chemistry was used and 2×105 cycles of sequencing were carried out. Base-calling was conducted with Illumina’s RTA software version 1.10.36.

Data Processing

Sequence read pairs were sorted into 58 barcodes based on the barcoded Nextera adapter primers for TEN16 (barcodes 01–11) and the barcoded E6 forward primers (barcodes 12–58) for variant analysis, using a custom script. In the first-round analysis, no mismatch was allowed. In the second-round analysis, one mismatch per eight nucleotides was allowed in the non-barcode area of the primers. To identify potential HPV16-cellular junctions, the TEN16 read pairs were aligned to the combined “HPV16REF” and “GRCh37” human reference sequence first using BWA version 0.5.9-r16 [73] with default parameters. Mapped pairs with at least one read mapped to HPV16REF were extracted. Since BWA is unable to efficiently identify HPV16-cellular chimeric reads (see http://bio-bwa.sourceforge.net), many of them were reported as unmapped. Therefore, reads were re-mapped using BWA-SW version 0.5.9-r16 [74] with default parameters to identify chimeric reads containing both HPV16 and human sequences. Using custom scripts, four categories of mapped read pairs were identified from BWA and BWA-SW outputs: (1) pure HPV16 pairs, (2) pure cellular pairs, (3) pairs flanking HPV16-cellular junctions, and (4) pairs with at least one read containing the HPV16-cellular junction sequence (see Figure 2D). From categories 3 and 4, potential HPV16-cellular junctions were identified by visual inspection using a cutoff value of 15 read pairs. To determine HPV16 variants, E6 read pairs were aligned to HPV16REF using BWA with default parameters. Only uniquely and properly mapped read pairs were selected for further analysis. “Properly mapped” were pairs with one read mapped to the sense strand and its mate mapped to the antisense strand of E6. Nucleotide polymorphisms in the E6 gene of individual tumor samples were called using SAMtools version 0.1.16-r963∶234 [75], and corrected manually by viewing the mapped reads in IGV [76]. Different HPV16 variants were assigned to the samples by unique nucleotide polymorphism patterns as reported [8], [56].

Junction-PCR

PCR validation using an HPV16 primer and a cellular primer both specific for the same junction was conducted. The primer combinations are shown in Table S7. For every selected junction candidate, all the five DNA samples sharing the respective barcode were tested individually. The CaSki junctions were checked only with CaSki DNA. Junction-PCR was performed with the HotStarTaq Plus Master Mix kit (Qiagen). The 20-µl reaction components comprised 10 µl of 2x master mix, 1 µl HPV16 forward primer (5 µM), 1 µl cellular primer (5 µM), 10 ng template DNA, 2 µl of 10x CoralLoad and water. Cycling conditions were: initial denaturation/activation at 95°C (5 min), 35 cycles including denaturation at 94°C (30 sec), annealing at 59°C (30 sec), and elongation at 72°C (1 min). PCR products were purified with QIAquick PCR purification kit, and sequenced at GATC Biotech (Konstanz, Germany).

Genomic Context Analysis

The junction sequences were subject to database searching for chromosome localization with the NCBI BLAST web interface (see http://blast.ncbi.nlm.nih.gov/Blast.cgi) using the MegaBLAST algorithm. The NCBI Map Viewer (see http://www.ncbi.nlm.nih.gov/mapview/) was used to map cellular breakpoints into specific chromosome regions, and to find out cellular genes located at or near the HPV16 integration sites. Cellular repetitive sequences were identified and classified by using the CENSOR tool [77]. The exon-intron structure of cellular genes was analyzed with the Ensembl genome browser (see http://www.ensembl.org/index.html).

Statistical Analysis

Fisher’s exact test was used to analyze the association between HPV16 E6 variants E-p and E-T350G and the presence or absence of integrated HPV16 DNA in the tumors. The exact two-tailed one-sample binomial test was applied to analyze the frequency distribution of HPV16 3′-breakpoints in different segments of the HPV16 early region, always compared to the relative length of the segments. P values of 0.05 and lower were considered to be statistically significant.

Accession Numbers

The dataset of Illumina HiSeq2000 read pairs has been deposited in the Sequence Read Archive (SRA) under the study accession number ERP002370. The nucleotide sequences of the 75 viral-cellular DNA junction sequences have been submitted to the European Nucleotide Archive (ENA). They have the accession numbers HE984501-HE984573, HE999548, and HF559481. Distribution of read pair numbers for the TEN16 junctions tested by junction-PCR. Altogether 84 junctions (75 authentic as filled bars and 9 false-positive as open bars) are shown arranged by increasing numbers of read pairs, and for each junction a serial number was designated accordingly. The read pair numbers are shown in log scale. The dashed line indicates the cutoff level of 15 read pairs. Identification of the seven junctions below the cutoff is explained in the main text. For DNA junction 3966_DJ1 (serial number 1, labeled with asterisk), the first-round data analysis did not produce any read pair. In the second analysis at low stringency (see Materials and Methods), 155 read pairs of this junction were detected. (TIF) Click here for additional data file. HPV16 forward primers for TEN16. (DOC) Click here for additional data file. Barcoded Nextera adapter sequence. (DOC) Click here for additional data file. HPV16-cellular DNA junctions validated by junction-PCR (sorted by sample ID). (DOC) Click here for additional data file. Frequency distribution of HPV16 3′-breakpoints of viral-cellular DNA junctions in different segments of the HPV16 early region. (XLS) Click here for additional data file. Cellular genes directly targeted by HPV16 DNA integration. (DOC) Click here for additional data file. First cellular genes within 500 kb downstream of integrated HPV16. (DOC) Click here for additional data file. Primer combinations for junction-specific PCR. (DOC) Click here for additional data file.
  77 in total

1.  Integration of papillomavirus DNA near myc genes in genital carcinomas and its consequences for proto-oncogene expression.

Authors:  J Couturier; X Sastre-Garau; S Schneider-Maunoury; A Labib; G Orth
Journal:  J Virol       Date:  1991-08       Impact factor: 5.103

2.  Human papillomavirus type 16 sequence variation in cervical cancers: a worldwide perspective.

Authors:  T Yamada; M M Manos; J Peto; C E Greer; N Munoz; F X Bosch; C M Wheeler
Journal:  J Virol       Date:  1997-03       Impact factor: 5.103

Review 3.  Preferential sites for viral integration on mammalian genome.

Authors:  N C Popescu; J A DiPaolo
Journal:  Cancer Genet Cytogenet       Date:  1989-10-15

4.  MYC activation associated with the integration of HPV DNA at the MYC locus in genital tumors.

Authors:  M Peter; C Rosty; J Couturier; F Radvanyi; H Teshima; X Sastre-Garau
Journal:  Oncogene       Date:  2006-05-08       Impact factor: 9.867

5.  Characterization of viral-cellular fusion transcripts in a large series of HPV16 and 18 positive anogenital lesions.

Authors:  Nicolas Wentzensen; Ruediger Ridder; Ruediger Klaes; Svetlana Vinokurova; Ulrike Schaefer; Magnus von Knebel Doeberitz
Journal:  Oncogene       Date:  2002-01-17       Impact factor: 9.867

Review 6.  Human papillomavirus oncoproteins: pathways to transformation.

Authors:  Cary A Moody; Laimonis A Laimins
Journal:  Nat Rev Cancer       Date:  2010-07-01       Impact factor: 60.716

7.  Human papillomavirus genotype attribution in invasive cervical cancer: a retrospective cross-sectional worldwide study.

Authors:  Silvia de Sanjose; Wim Gv Quint; Laia Alemany; Daan T Geraets; Jo Ellen Klaustermeier; Belen Lloveras; Sara Tous; Ana Felix; Luis Eduardo Bravo; Hai-Rim Shin; Carlos S Vallejos; Patricia Alonso de Ruiz; Marcus Aurelho Lima; Nuria Guimera; Omar Clavero; Maria Alejo; Antonio Llombart-Bosch; Chou Cheng-Yang; Silvio Alejandro Tatti; Elena Kasamatsu; Ermina Iljazovic; Michael Odida; Rodrigo Prado; Muhieddine Seoud; Magdalena Grce; Alp Usubutun; Asha Jain; Gustavo Adolfo Hernandez Suarez; Luis Estuardo Lombardi; Aekunbiola Banjo; Clara Menéndez; Efrén Javier Domingo; Julio Velasco; Ashrafun Nessa; Saibua C Bunnag Chichareon; You Lin Qiao; Enrique Lerma; Suzanne M Garland; Toshiyuki Sasagawa; Annabelle Ferrera; Doudja Hammouda; Luciano Mariani; Adela Pelayo; Ivo Steiner; Esther Oliva; Chris Jlm Meijer; Waleed Fahad Al-Jassar; Eugenia Cruz; Thomas C Wright; Ana Puras; Cecilia Ladines Llave; Maria Tzardi; Theodoros Agorastos; Victoria Garcia-Barriola; Christine Clavel; Jaume Ordi; Miguel Andújar; Xavier Castellsagué; Gloria I Sánchez; Andrzej Marcin Nowakowski; Jacob Bornstein; Nubia Muñoz; F Xavier Bosch
Journal:  Lancet Oncol       Date:  2010-10-15       Impact factor: 41.316

8.  The majority of viral-cellular fusion transcripts in cervical carcinomas cotranscribe cellular sequences of known or predicted genes.

Authors:  Irene Kraus; Corina Driesch; Svetlana Vinokurova; Eivind Hovig; Achim Schneider; Magnus von Knebel Doeberitz; Matthias Dürst
Journal:  Cancer Res       Date:  2008-04-01       Impact factor: 12.701

Review 9.  Systematic review of genomic integration sites of human papillomavirus genomes in epithelial dysplasia and invasive cancer of the female lower genital tract.

Authors:  Nicolas Wentzensen; Svetlana Vinokurova; Magnus von Knebel Doeberitz
Journal:  Cancer Res       Date:  2004-06-01       Impact factor: 12.701

10.  Human papillomavirus type 16 DNA sequence.

Authors:  K Seedorf; G Krämmer; M Dürst; S Suhai; W G Röwekamp
Journal:  Virology       Date:  1985-08       Impact factor: 3.616

View more
  49 in total

1.  Chromosomal copy number alterations and HPV integration in cervical precancer and invasive cancer.

Authors:  Clara Bodelon; Svetlana Vinokurova; Joshua N Sampson; Johan A den Boon; Joan L Walker; Mark A Horswill; Keegan Korthauer; Mark Schiffman; Mark E Sherman; Rosemary E Zuna; Jason Mitchell; Xijun Zhang; Joseph F Boland; Anil K Chaturvedi; S Terence Dunn; Michael A Newton; Paul Ahlquist; Sophia S Wang; Nicolas Wentzensen
Journal:  Carcinogenesis       Date:  2015-12-09       Impact factor: 4.944

2.  Integration sites of human papillomavirus 18 in esophageal cancer samples.

Authors:  Shuying Li; Zhanjun Liu; Jianghong Yan; Shangbo Sun; Xiaoli Hou; Dianqing Liu; Ke Zhang; Jin-Tao Li
Journal:  Oncol Lett       Date:  2018-03-07       Impact factor: 2.967

3.  Tobacco exposure results in increased E6 and E7 oncogene expression, DNA damage and mutation rates in cells maintaining episomal human papillomavirus 16 genomes.

Authors:  Lanlan Wei; Anastacia M Griego; Ming Chu; Michelle A Ozbun
Journal:  Carcinogenesis       Date:  2014-07-26       Impact factor: 4.944

4.  HPV 16 E5 oncoprotein is expressed in early stage carcinogenesis and can be a target of immunotherapy.

Authors:  Francesca Paolini; Gianfranca Curzio; Marcelo Nazario Cordeiro; Silvia Massa; Luciano Mariani; Fulvia Pimpinelli; Antonio Carlos de Freitas; Rosella Franconi; Aldo Venuti
Journal:  Hum Vaccin Immunother       Date:  2016-12-08       Impact factor: 3.452

5.  Characterization of HPV and host genome interactions in primary head and neck cancers.

Authors:  Michael Parfenov; Chandra Sekhar Pedamallu; Nils Gehlenborg; Samuel S Freeman; Ludmila Danilova; Christopher A Bristow; Semin Lee; Angela G Hadjipanayis; Elena V Ivanova; Matthew D Wilkerson; Alexei Protopopov; Lixing Yang; Sahil Seth; Xingzhi Song; Jiabin Tang; Xiaojia Ren; Jianhua Zhang; Angeliki Pantazi; Netty Santoso; Andrew W Xu; Harshad Mahadeshwar; David A Wheeler; Robert I Haddad; Joonil Jung; Akinyemi I Ojesina; Natalia Issaeva; Wendell G Yarbrough; D Neil Hayes; Jennifer R Grandis; Adel K El-Naggar; Matthew Meyerson; Peter J Park; Lynda Chin; J G Seidman; Peter S Hammerman; Raju Kucherlapati
Journal:  Proc Natl Acad Sci U S A       Date:  2014-10-13       Impact factor: 11.205

6.  Integration of the full-length HPV16 genome in cervical cancer and Caski and Siha cell lines and the possible ways of HPV integration.

Authors:  Feng Xu; Meng Cao; Qinfeng Shi; Hongwei Chen; Yili Wang; Xu Li
Journal:  Virus Genes       Date:  2015-02-10       Impact factor: 2.332

Review 7.  Utility of high-throughput DNA sequencing in the study of the human papillomaviruses.

Authors:  Noé Escobar-Escamilla; José Ernesto Ramírez-González; Graciela Castro-Escarpulli; José Alberto Díaz-Quiñonez
Journal:  Virus Genes       Date:  2017-12-27       Impact factor: 2.332

8.  Methylation of human papillomavirus 16, 18, 31, and 45 L2 and L1 genes and the cellular DAPK gene: Considerations for use as biomarkers of the progression of cervical neoplasia.

Authors:  Mina Kalantari; Kathryn Osann; Itzel E Calleja-Macias; Seong Kim; Bing Yan; Sara Jordan; Dana M Chase; Krishnansu S Tewari; Hans-Ulrich Bernard
Journal:  Virology       Date:  2013-11-09       Impact factor: 3.616

9.  Chronic oxidative stress increases the integration frequency of foreign DNA and human papillomavirus 16 in human keratinocytes.

Authors:  Yan Chen Wongworawat; Maria Filippova; Vonetta M Williams; Valery Filippov; Penelope J Duerksen-Hughes
Journal:  Am J Cancer Res       Date:  2016-03-15       Impact factor: 6.166

10.  Genomic Integration of High-Risk HPV Alters Gene Expression in Oropharyngeal Squamous Cell Carcinoma.

Authors:  Heather M Walline; Christine M Komarck; Jonathan B McHugh; Emily L Bellile; J Chad Brenner; Mark E Prince; Erin L McKean; Douglas B Chepeha; Gregory T Wolf; Francis P Worden; Carol R Bradford; Thomas E Carey
Journal:  Mol Cancer Res       Date:  2016-07-15       Impact factor: 5.852

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.