Anne Van Arsdale1, Nicole E Patterson2, Elaine C Maggi2, Lorenzo Agoni3, Koenraad Van Doorslaer4, Bryan Harmon5, Nicole Nevadunsky1, Dennis Y S Kuo1, Mark H Einstein6, Jack Lenz2, Cristina Montagna2,5. 1. Department of Obstetrics & Gynecology and Women's Health, Albert Einstein College of Medicine, Yeshiva University, Bronx, New York. 2. Department of Genetics, Albert Einstein College of Medicine, Yeshiva University, Bronx, New York. 3. Department of Women's and Children's Health, Obstetrics & Gynecology Unit, Fondazione Poliambulanza Istituto Ospedaliero, Brescia, Italy. 4. School of Animal and Comparative Biomedical Sciences, College of Agriculture and Life Sciences BIO5 Institute University of Arizona, Tuscon, Arizona, USA. 5. Department of Pathology, Albert Einstein College of Medicine, Yeshiva University, Bronx, New York. 6. Department of Obstetrics, Gynecology, and Women's Health, Rutgers New Jersey Medical School, Newark, New Jersey.
Abstract
Cervical carcinogenesis, the second leading cause of cancer death in women worldwide, is caused by multiple types of human papillomaviruses (HPVs). To investigate a possible role for HPV in a cervical carcinoma that was HPV-negative by PCR testing, we performed HPV DNA hybridization capture plus massively parallel sequencing. This detected a subgenomic, URR-E6-E7-E1 segment of HPV70 DNA, a type not generally associated with cervical cancer, inserted in an intron of the B-cell lymphoma/leukemia 11B (BCL11B) gene in the human genome. Long range DNA sequencing confirmed the virus and flanking BCL11B DNA structures including both insertion junctions. Global transcriptomic analysis detected multiple, alternatively spliced, HPV70-BCL11B, fusion transcripts with fused open reading frames. The insertion and fusion transcripts were present in an intraepithelial precursor phase of tumorigenesis. These results suggest oncogenicity of HPV70, identify novel BCL11B variants with potential oncogenic implications, and underscore the advantages of thorough genomic analyses to elucidate insights into HPV-associated tumorigenesis.
Cervical carcinogenesis, the second leading cause of cancer death in women worldwide, is caused by multiple types of human papillomaviruses (HPVs). To investigate a possible role for HPV in a cervical carcinoma that was HPV-negative by PCR testing, we performed HPV DNA hybridization capture plus massively parallel sequencing. This detected a subgenomic, URR-E6-E7-E1 segment of HPV70 DNA, a type not generally associated with cervical cancer, inserted in an intron of the B-cell lymphoma/leukemia 11B (BCL11B) gene in the human genome. Long range DNA sequencing confirmed the virus and flanking BCL11B DNA structures including both insertion junctions. Global transcriptomic analysis detected multiple, alternatively spliced, HPV70-BCL11B, fusion transcripts with fused open reading frames. The insertion and fusion transcripts were present in an intraepithelial precursor phase of tumorigenesis. These results suggest oncogenicity of HPV70, identify novel BCL11B variants with potential oncogenic implications, and underscore the advantages of thorough genomic analyses to elucidate insights into HPV-associated tumorigenesis.
Infections with human papillomavirus (HPV) types that are classified as high‐risk by epidemiological criteria (hrHPVs) underlie the vast majority of invasive cervical carcinomas (ICCs).1 ICCs are responsible for 4.5% of cancer deaths in women worldwide and comprise a substantial fraction of the estimated 12% of cancers worldwide caused by various viruses.2 Multiple large studies showed hrHPV types to be present in at least 90% of ICCs using established, PCR‐ or hybridization‐based methods for viral DNA detection.3, 4 HPV types categorized as group 1 known carcinogens by the International Agency for Research on Cancer working group are phylogenetically classified within the high‐risk Alphapapillomavirus‐7 and Alphapapillomavirus‐9 clades of the Alphapapillomavirus genus and include the HPV types HPV16, HPV18, HPV31, HPV33, HPV35, HPV39, HPV45, HPV51, HPV52, HPV56, HPV58, and HPV59, with HPV16 and HPV18 being the most frequently detected types in ICC.5 Despite also belonging to the species Alphapapillomavirus 7 and Alphapapillomavirus 9, other less frequent HPV types also belonging to these clades (HPV26, HPV30, HPV34, HPV53, HPV66, HPV67, HPV69, HPV70, HPV73, HPV82, and HPV85), do not fulfill the rigorous epidemiological criteria required to be categorized as group 1 carcinogens.5 At least 40 HPV types are known to infect the urogenital tract.1, 6, 7 Up to ~10% of invasive cervical cancers are currently considered HPV‐negative.3, 4 This might be because of tissue sampling errors, the portion of the HPV genome targeted by the assay was deleted, or the tumor is associated with a low prevalence HPV type.8 An additional formal possibility is that some small fraction has a non‐HPV etiology.Population screening for women for cervical cancer historically has been accomplished by cytology (Papanicolaou test), but multiple clinical trials have provided evidence of improved sensitivity of hrHPV testing for detection of cervical cancer precursor lesions compared to cytology alone over multiple rounds of screening and follow‐up.9, 10, 11, 12, 13 The HPV types included in the screening assays are based on epidemiological prevalence data that balance sensitivity and specificity so as not to over‐triage women to secondary screening such as colposcopy, and they currently include at most the 14 hrHPV types. Exclusion of less prevalent types from HPV screening reduces possible iatrogenic morbidity from subsequent colposcopy and excisional procedures following detection of viral types of uncertain pathological significance, and also reduces the costs involved in more extensive testing.14 Unlike population‐based screening, diagnostic HPV testing of identified preinvasive and invasive cervical lesions is applicable to research driven assays.Although at least half of sexually active women incur a genital tract HPV infection during their lifetime,15 cervical tumorigenesis occurs in only a very small fraction of infectedwomen. Tumorigenesis proceeds in a small fraction of women through a series of histopathologically defined, cervical intraepithelial neoplasia (CIN) lesions of progressively decreasing likelihood, CIN1, CIN2, and CIN3. The vast majority of HPV infections resolve spontaneously, with even most CIN lesions resolving spontaneously and only 20% to 30% of CIN3 lesions progressing to life‐threatening ICC.HPV infection proceeds with the ~8 kbp viral DNA genome replicating as a circular episome initially in the basal epithelial cell layer. A key step in HPV‐associated tumorigenesis is integration of viral DNA into the human genome.16, 17, 18, 19 At least part of the viral genome is inserted into human DNA in most HPV‐induced ICCs, and the integration process is thought to be mediated by host DNA repair mechanisms. Integration causes stable association of the virally encoded oncogenes with a host cell, triggers human genome rearrangements, and drives expression of the human oncogenes that flank the sites of integration.20, 21, 22, 23, 24, 25 In most ICCs, the HPV genome is inserted near one allele of a human, dominant‐acting oncogene, and many human oncogenes have been identified as recurring HPV insertion sites in tumors.22, 26, 27, 28, 29, 30 The presence of viral DNA adjacent to such human oncogenes is thought to be a consequence of integration occurring at a stochastic position within the human genome, followed by selectively advantageous, clonal proliferation of cells that suffered an integration near such oncogenes. Insertional oncogenesis by activation of flanking oncogenes is a long studied and thoroughly established mechanism of tumorigenesis by various retroviruses in experimental and naturally occurring animal systems in which insertion of viral DNA containing strong transcriptional elements, particularly enhancers, alters the transcription of the flanking gene.31, 32, 33, 34, 35 This is also the causal mechanism in the tumors that occurred during gene therapy studies in humans.36, 37 HPV activation of nearby oncogene transcription may entail a variety of mechanisms including fusion transcripts. These initiate at viral promoters and proceed into the host genes. Viral enhancers may also activate flanking oncogene promoters.38, 39, 40Approaches to detect integrated viral DNA have included RNA sequencing (RNAseq) to detect fusion transcripts, and DNA sequencing strategies including PCR or hybridization capture to identify the junctions between inserted HPV and human genome DNA.20, 21, 22, 23 However, in most instances only one of the two junctions between the linear HPV DNA fragment and human DNA was identified, and structural characterization of the inserted viral DNA was incomplete or inferred. Here we describe the use of a multiple HPV type, hybridization capture approach yielding sufficient DNA recovery and massively parallel sequencing depth to identify both junctions of an inserted HPV DNA. Combined with additional molecular genetic techniques, this allows comprehensive, robust characterization of HPV cervical tumorigenesis.
METHODS
Sample collection and nucleic acid preparation
Tumor and matching pretumor (CIN3) samples from a single patient were collected as part of an ongoing tissue collection protocol of the Department of Obstetrics & Gynecology and Women's Health at the Albert Einstein College of Medicine, and all collection and experimental procedures were approved by the Internal Review Board of the Albert Einstein College of Medicine (IRB#2009‐265, IRB#2018‐9256). The fresh tumor biopsy was transported in sterile medium from the operating room to the Department of Pathology where sufficient tissue was procured for histopathologic diagnosis. The remaining tumor tissue was snap frozen in liquid nitrogen and transferred immediately to −80°C for storage until use. Ten micrometer sections of the frozen tissue were cut for H&E confirmation of the presence of tumor tissue by one of us who is a trained gynecologic pathologist (B.H.). DNA and RNA were extracted from sections of the frozen tissue using the QIAamp DNA Mini Kit and RNeasy Mini Kit (Qiagen), respectively, and stored at −80°C until use. CIN3 DNA and RNA were prepared using 10 μm sections from the patient's archived CIN3 FFPE block and the RNeasy Plus Universal Kit (Qiagen), respectively. DNA and RNA concentrations were quantified using the Qubit Fluorometric Quantification method (Thermo Fisher Scientific) and their integrity assessed using the Bioanalyzer capillary electrophoresis system (Agilent).
HPV DNA hybridization capture and short read Illumina sequencing
Targeted enrichment of HPV DNA was performed using hybridization capture probes homologous to the full‐length genomes of 15 different HPV types (6, 11, 16, 18, 31, 33, 35, 39, 45, 52, 56, 58, 59, 68, 69) that were custom‐designed using the Roche Nimblegen SeqCap EZ System (Roche, Basel, Switzerland). Biotinylated oligonucleotide probes specific for each HPV type were designed at ~150 bp intervals encompassing the positive strands of the complete ~8 kb double strand viral genomes to achieve 100% coverage. For library preparation, tumor genomic DNA was mechanically fragmented to 200 bp (Covaris, Woburn, MA), and Illumina adaptors were ligated at each end. Libraries were then hybridized to the custom HPV capture probes for 72 h using the Roche target enrichment protocol following manufacture instructions and sequenced on one Illumina HiSeq 2500 lane (Illumina, San Diego, CA) using the paired end 150 bp sequencing mode. After sequencing, we aligned adaptor‐cleaned, QC‐passed, de‐duplicated, paired end reads to a custom human (GRCh37/hg19) plus HPV reference genome containing 143 alpha genus HPV types from the Papillomavirus Episteme48 using Burrows‐Wheeler Aligner (BWA).49 Junction fragments were computationally identified using a combination of programs; Delly50 and SplazerS,51 both of which specify a combined read pair discordancy and split read analysis.
Long sequencing reads using the Oxford Nanopore Technologies MinION
Tumor‐derived gDNA was purified and concentrated using the genomic DNA Clean and Concentrator‐10 (gDCC‐10) kit following manufacturer instructions (Zymo Research); concentration was assessed by Qubit and quality was assessed by Nanodrop (Thermo Fisher Scientific). The purified gDNA was then sheared by g‐TUBE (Covaris, Woburn, MA) to 10 kb. Genomic libraries were prepared using the Oxford Nanopore 1D ligation library prep kit SQK‐LSK108 following manufacturer instructions. Two independently prepared tumor DNA libraries were loaded onto two R9.4 flow cells and sequenced using a MinION Mk1b device (Oxford Nanopore Technologies [ONT]) using the standard 48‐h scripts. Postsequencing base calling and FASTQ extraction was performed using Albacore v2.0 (ONT), whereby raw voltage channel data is translated into canonical nucleotides. All libraries and quality filtered pass reads (Q ≥ 7) were used for the subsequent analysis. Library‐specific adaptors were trimmed and possible internal adaptors split using Porechop41 with default parameters for adaptor identification (90% identity), end trimming (75% identity), and internal splitting (85% identity). Sequence reads were aligned to our custom combined reference genome using Ngmlr,52 a structural variant caller that uses a structural variant aware k‐mer search to approximate alignments followed by a banded Smith–Waterman final alignment, along with a convex gap cost model to account for higher sequencing error frequencies associated with long reads. Structural variants were called using Sniffles52 with parameter adjustment for expected low coverage.
Detection of chromosome structural alterations, aneuploidy, and copy number variation by whole genome sequencing and single nucleotide polymorphism analysis
For whole genome sequencing (WGS), 1 μg of genomic DNA from the tumor was used as input for library preparation using the NEBNext DNA Library Prep Kit (New England Biolabs). DNA was mechanically fragmented to 350 bp (Covaris, Woburn, MA) and the DNA fragments were end‐polished, A‐tailed, adaptor ligated, PCR amplified by P5 and indexed P7 oligos, and purified (AMPure XP system). The libraries were purified with AMPure XP (Beckman Coulter). QC passed libraries were sequenced on 2 Illumina HiSeq 2500 lanes to achieve 60X genome coverage on the paired end 150 bp mode. Reads with adaptor contamination or low quality (Phred Q30 < 80%) were removed. After sequencing, paired end reads were aligned to our custom human (GRCh37/hg19) plus HPV70 reference genome using BWA.49 Files were mapped in BAM format, sorted using SAMtools,53 and duplicates removed using Picard.54 Genomic variant detection was accomplished for single nucleotide polymorphisms (SNPs)/small insertions and deletions (indels) using GATK v3.8,55 structural variant (SV) detection using Delly v0.7.3,50 and copy number variants detection using control‐FREEC.56 Following genomic variant detection, variants were annotated using the software ANNOVAR.57 Tumor genomic copy number was also assessed using the Genome‐Wide Human SNP array 6.0 (Thermo Fisher Scientific). DNA prehandling and array hybridization were performed according to the manufacturer's instructions (Affymetrix, Santa Clara, CA) and scanned in an Affymetrix GeneChip Scanner 3000. Data analysis and visualization was performed in Chromosomal Analysis Suite v4.0 (ChAS, Thermo Fisher Scientific) with a threshold of minimum 100 kb and 50 markers using NCBI build GRCh37/hg19.
Global transcriptomic profiling of tumor RNA
Total RNA (2 μg) isolated from six, 10 μm, tumor sections with an RNA integrity number of 9.3 (Agilent 1000) and subjected to oligoT magnetic bead enrichment. Libraries were prepared following the NEB standard protocol as follows: cDNA was synthesized using random hexamer primers and M‐MuLV RT (RNaseH‐) followed by second strand synthesis with DNA polymerase I and RNaseH. The double stranded cDNA was purified using AMPure XP beads (Beckman Coulter), end tail repaired, adaptor ligated, size‐selected, and PCR amplified before Illumina sequencing. A total of 150 bp insert cDNA libraries were sequenced to a depth of 60 million reads on one Illumina HiSeq 2500 lane (Illumina, Inc., San Diego, CA) using the paired end 150 bp mode. Raw image data were transformed to sequenced reads by CASAVA and stored in FASTQ format. Raw reads were then filtered to remove adaptors, reads containing N > 10% (N representing undetermined bases), and reads of Qscore (quality value) of >50% of bases ≤5. Of approximately 144 million reads, 97.8% passed filtering with >96% having Phred scores >30. Gene model annotation files were downloaded from the genome website browser (NCBI/UCSC/Ensembl) for (GRCh37/hg19) and in custom format from Papillomavirus Episteme58 for HPV70. Indexes of the custom reference genome were built, and paired‐end clean reads were aligned to the reference using STAR v2.5.42 HTSeq v0.6.159 was used to count the read numbers mapped of each gene. As there was no tumor‐free matched control for this patient, differential expression analysis was not performed. As our main interest was in potential fusion transcripts between BCL11B and HPV70, STAR‐Fusion 0.8.060 was used for the detection of fusion transcripts.
Validation of HPV DNA integration junctions and HPV70‐BCL11B fusion transcripts
To detect the integrated HPV70 DNA segment, PCR primers were designed to flank each side of the HPV70 genome—human genome junctions obtained from HC + NGS split reads. The chr14:99683287—HPV70 L1:6306 primers were 5′‐TTCCAAAAGTGTCTGGCAAA and 5′ GTGTGTGAATGTGGGGGTGT producing a 216 bp amplicon. The chr14:99683253—HPV70 E1:2380 primer pair was 5′‐TACAGGGACCACCAAACACA and 5′‐CTGGACTGCACACAGACACA producing a 168 bp amplicon, and PCR reactions were performed using GoTaq Green (Promega) using 30 cycles of 94°C for 30 s, 60°C for 30 s, and 72°C for 45 s. To detect the HPV70‐BCL11B fusion transcripts, primers were designed to flank each side of the fusion splice junction from the HPV70 E1 5′ss at genome position 943 to the BCL11B exon 4 3′ss at GRCh37/hg19 genome position chr14:99642532 obtained from RNAseq split reads. The HPV70 primer was 5′‐GAAGAACCACAGCGTCACAA and the BCL11B exon 4 primer was 5′‐TGCAAATGTAGCTGGAAGGC with a 214 bp amplicon. First‐strand cDNA synthesis was performed on 5 μg of total RNA from either fresh tumor tissue or archival FFPE CIN3 tissue using the exon 4 primer and SuperScriptII RT (Thermo Fisher Scientific) following manufacture's protocol. The subsequent PCR to detect the presence of the fusion transcripts was performed using 35 cycles of 95°C for 15 s, 60°C for 30 s, and 72°C for 45 s. The resultant PCR products were resolved on 0.8% or 2.0% agarose gels, respectively, and the bands were excised using the Monarch DNA Gel Extraction Kit (New England Biolabs) for Sanger sequencing analysis.
In situ visualization of BCL11B locus
Fluorescence in situ hybridization (FISH) was used to visualize the BCL11B at the single cell level on histopathologic sections of the tumor. BAC RP‐11‐179F4 spanning the BCL11B locus was obtained from the BACPAC Resources Center at the Children's Hospital Oakland Research Institute. Eight micrometer sections were cut from the diagnostic FFPE tumor block from the patient, and mounted on positively charged slides. Slides were incubated overnight at 56°C prior to FISH hybridization that was performed as previously described.61 Briefly, the slides were deparaffinized in Hemo‐De at room temperature for 10 min × 2, dehydrated in 100% ethanol for 5 min × 2 and placed on a 50°C slide warmer for 5 min. The slides were then pretreated for 24 min using the Vysis Paraffin Pretreatment Reagent Kit (Abbott Molecular), and fixed in 10% buffered formalin as per the protocol. The FISH probe was labeled by nick translation using DY‐415‐aadUTP (Dyomics, Jena, GE) as previously described61 and hybridized overnight. The slides were then washed in 0.4X SSC prewarmed to 74°C, followed by 4X SSC/0.1% Tween. Images were acquired with a manual inverted fluorescence microscope (Axiovert 200, Zeiss) with fine focusing oil immersion lens (×40, NA 1.3). The resulting emissions were collected using 350–470 nm (for DAPI) and 470–540 nm (for DY‐495‐dUTP) filters. The microscope was equipped with a Camera Hall 100 and the Applied Spectral Imaging software. Images representing 90 nuclei were randomly acquired and saved as. tiff composite files. Single cell copy numbers of the BCL11B locus were manually counted and categorized as described in the figures.
RESULTS
HPV DNA analysis was applied to a cervical carcinoma chosen specifically based on its HPV‐negative, clinical testing status at the time of cancer diagnosis. A 46‐year‐old woman presented at the emergency room of our institution with International Federation of Gynecology and Obstetrics stage IIB, squamous cell ICC. HPV testing using the cobas test (Roche Molecular Systems), which detects 14 hrHPV types by PCR of a viral L1 gene segment followed by probe hybridization, was negative. Tumor DNA was subjected to hybridization capture enrichment of HPV DNA followed by massively parallel, next generation, DNA sequencing (HC + NGS) to search for any HPV DNA present in the lesion and, if so, determine whether it was inserted into the human genome. Targeted enrichment was accomplished using hybridization to a custom designed capture probe set containing DNA from 15 different hrHPV types (HPV 6, HPV11, HPV16, HPV18, HPV31, HPV33, HPV35, HPV39, HPV45, HPV52, HPV56, HPV58, HPV59, HPV68, HPV69). Each probe consists of a biotinylated, roughly 150 nucleotide long DNA strand specific for each HPV type (about 55 probes per HPV genome with probe overlap) encompassing the positive strands of the full ~8kbp HPV genomes to allow capture along each entire viral genome.HC + NGS of the tumor DNA yielded 155 832 unique paired end reads that mapped to the HPV70 genome (Figure 1), with average coverage depth of 1054×. All reads mapped to one 3980 bp segment encompassing about half of the viral genome from positions 6306 to 2380 including the entire upstream regulatory region (URR), E6 and E7 segments plus parts of the E1 and L1 genes. The segment contained 55 single nucleotide substitutions (1.4%) relative to the HPV70 reference sequence (Supporting Information Figure S1). Although HPV70 was not specifically targeted by the probes used, the HPV70 DNA fragments were presumably recovered by hybridization to related species Alphapapillomavirus 7 types in the set, most likely HPV39 and HPV68. Forced alignment to a secondary custom genome containing only the HPV types included in the hybridization probe set was attempted; however, alignment in this setting was both lower in unique reads (72 681 aligned read pairs to HPV type 68) and identity (<90%). More importantly, detection of HPV70 DNA demonstrated that this clinically HPV‐negative tumor in fact did contain HPV, albeit from an uncommon type currently not classified as high risk.
Figure 1
HPV70 genome coverage of sequence reads from viral DNA hybridization capture plus Illumina short read sequencing. The outer circle shows the circular episomal form of the HPV70 genome. Defined human papillomavirus oncogenes are depicted in red, other early genes in green, late genes in yellow, and the upstream regulatory region (URR) in blue. The inner circle shows a histogram of sequencing coverage ranging from of 0 to 4200 reads. All reads corresponded to the segment from genome positions 6906 to 2380 containing the URR E6 and E7, with no reads detected in the remaining half of the viral genome
HPV70 genome coverage of sequence reads from viral DNA hybridization capture plus Illumina short read sequencing. The outer circle shows the circular episomal form of the HPV70 genome. Defined human papillomavirus oncogenes are depicted in red, other early genes in green, late genes in yellow, and the upstream regulatory region (URR) in blue. The inner circle shows a histogram of sequencing coverage ranging from of 0 to 4200 reads. All reads corresponded to the segment from genome positions 6906 to 2380 containing the URR E6 and E7, with no reads detected in the remaining half of the viral genomeThe HC + NGS analysis detected junctions between the ends of the viral segment and sequences in the last intron of the humanBCL11B gene between the third and fourth exons (Figure 2A). BCL11B encodes a Kruppel‐like zinc finger protein that was previously identified as an integration site of HPV16 in at least two ICCs43, 62 and is mutated in other humancancers, most notably T‐lymphocyte tumors.44, 45, 63, 64, 65 The occurrence of the two junctions at nearby genomic positions in BCL11B suggested that the 3980 bp HPV70 segment comprised a single insertion at this site (Figure 2A). To confirm the implied structure of HPV70 in a BCL11B allele, the tumor was subjected to intensive molecular genetic characterization by long range DNA sequencing, short range WGS, Affymetrix Genome‐Wide Human SNP Array 6.0 analysis, and RNA sequencing (RNAseq).
Figure 2
DNA sequencing analysis of the HPV70 DNA‐containing tumor. A, Genetic diagram of the inferred structure of the HPV70 DNA segment insertion into an allele of the BCL11B gene based on hybridization capture and Illumina sequencing of HPV70 DNA. HPV70 components and genome positions are depicted in blue. BCL11B components and positions on chromosome 14 (hg19) are in green. Red shows the positions and sequences of microhomology at the junctions between human DNA and the viral DNA insert. Approximate distances of components are depicted by the dotted arrows at the top. Green rectangles and numbers indicate BCL11B exons with the 5′ and 3′ untranslated regions at lower height. Exons and HPV70 DNA are drawn larger than the scale of the full‐length BCL11B region. B, Genome coverage and approximate sizes of the Oxford Nanopore MinION sequence reads encompassing HPV70 DNA confirming the single ~4 kbp insert and the structure of the immediately surrounding BCL11B sequences. C, Illumina short read, whole genome sequence coverage of the BCL11B gene on chromosome 14. The full‐length BCL11B gene is at the bottom. An enlarged view of the HPV70 segment and immediately surrounding human DNA showing the lower coverage of the HPV70 segment is presented above. D, Gingkos plot of whole genome sequencing depth across the human genome from chromosome 1 through chromosome X. Red dots indicate chromosomal loss and blue dots indicates chromosomal gain. E, Human genome‐wide Affymetrix SNP array coverage on each human chromosome. Short horizontal blue lines indicate segments where copy number ≥ 3 was detected. F, Fluorescence in situ hybridization of a BAC probe encompassing the BCL11B gene to a 8 μM histological section from the tumor showing the BCL11B gene copy number variation among the tumor cells. Red is the fluorescently labeled BAC DNA probe comprising the BCL11B region. The top and bottom panels show two different fields. DAPI staining of nuclei in each is shown in the left panels and was filtered from the right panels. G, Hematoxylin and eosin stained histological section from the tumor showing several large cells and their variable sizes. H, Quantification of the number of BCL11B loci in individual tumor cells. The number of fluorescent signals in each of 90 cells as in panel (F) was counted
DNA sequencing analysis of the HPV70 DNA‐containing tumor. A, Genetic diagram of the inferred structure of the HPV70 DNA segment insertion into an allele of the BCL11B gene based on hybridization capture and Illumina sequencing of HPV70 DNA. HPV70 components and genome positions are depicted in blue. BCL11B components and positions on chromosome 14 (hg19) are in green. Red shows the positions and sequences of microhomology at the junctions between human DNA and the viral DNA insert. Approximate distances of components are depicted by the dotted arrows at the top. Green rectangles and numbers indicate BCL11B exons with the 5′ and 3′ untranslated regions at lower height. Exons and HPV70 DNA are drawn larger than the scale of the full‐length BCL11B region. B, Genome coverage and approximate sizes of the Oxford Nanopore MinION sequence reads encompassing HPV70 DNA confirming the single ~4 kbp insert and the structure of the immediately surrounding BCL11B sequences. C, Illumina short read, whole genome sequence coverage of the BCL11B gene on chromosome 14. The full‐length BCL11B gene is at the bottom. An enlarged view of the HPV70 segment and immediately surrounding human DNA showing the lower coverage of the HPV70 segment is presented above. D, Gingkos plot of whole genome sequencing depth across the human genome from chromosome 1 through chromosome X. Red dots indicate chromosomal loss and blue dots indicates chromosomal gain. E, Human genome‐wide Affymetrix SNP array coverage on each human chromosome. Short horizontal blue lines indicate segments where copy number ≥ 3 was detected. F, Fluorescence in situ hybridization of a BAC probe encompassing the BCL11B gene to a 8 μM histological section from the tumor showing the BCL11B gene copy number variation among the tumor cells. Red is the fluorescently labeled BAC DNA probe comprising the BCL11B region. The top and bottom panels show two different fields. DAPI staining of nuclei in each is shown in the left panels and was filtered from the right panels. G, Hematoxylin and eosin stained histological section from the tumor showing several large cells and their variable sizes. H, Quantification of the number of BCL11B loci in individual tumor cells. The number of fluorescent signals in each of 90 cells as in panel (F) was countedLong range DNA sequencing utilizing an Oxford Nanopore MinION flowcell yielded 3.56X haploid genome coverage including three reads encompassing the HPV70 DNA insertion in BCL11B (Figure 2B). This unambiguously confirmed the structure of the insertion. WGS at 60X average genome coverage also confirmed both virus‐human junctions, the sequence of the HPV70 segment detected by HC + NGS, and the absence of the remainder of HPV70 (Figure 2C). The number of WGS read counts across HPV70 was about one‐third the number across BCL11B, indicating that HPV70 DNA was present in one allele of the BCL11B gene, and that one or more normal BCL11B alleles were also present, which is consistent with the insertion allele acting dominantly in tumorigenesis. Human DNA copy number was assessed genome‐wide by generating a Ginko plot of WGS reads across the human genome (Figure 2D). Gain of the entire chromosome 3q arm was detected, along with copy number increases on human chromosomes 1q, 5p, 8, 12q, and 14q, the latter including the BCL11B locus. 3q trisomy, which is highly recurrent in ICC, has been proposed to define transition from cervical dysplasia to invasive carcinoma through effects of the telomerase reverse transcriptase encoding TERT gene.66, 67 The 1q, 5p, 8, 12q, and 14q copy number variations were also detected by genome‐wide SNP analysis (Figure 2E), thus confirming their presence in the tumor. The observation that they were not as pronounced as the 3q trisomy indicated that they were less uniform among the tumor cells and likely involved multiple segments of the cognate chromosomes.To investigate the extent of variability of the BCL11B copy number among the tumor cells, fluorescent in situ hybridization (FISH) was performed on histological sections of the tumor using a BAC probe encompassing the BCL11B gene (Figure 2F). Quantification showed that most cells contained between two and 10 BCL11B loci, with modal number three (Figure 2H). About 4% of cells had 50 or more copies (Figure 2F,H). These cells were very large compared to the other tumor cells (Figure 2F) and were evident in standard, hematoxylin plus eosin stained, histopathology sections (Figure 2G). In summary, there was extensive variability in BCL11B copy number among the tumor cells.The only HPV70 DNA detected in the tumor was the 3980 bp insert in BCL11B. No full‐length, circular viral genomes were detectable. The two viral‐human DNA junctions encompassed short stretches of sequence identity or microhomology, 12/13 bp at one end and 2 bp at the other (Figure 2A). Insertion of HPV70 DNA was accompanied by deletion of 34 bp relative to the reference human genome. The viral DNA insertion allele comprised a minority fraction of all the BCL11B loci present in the tumor cell population (Figure 2C), with the BCL11B copy number being highly nonuniform among the tumor cells (Figure 2H).Intronic insertion of the HPV70 viral segment including the URR and full length E6 and E7 genes in the same transcriptional orientation as BCL11B raised questions of whether both viral oncogenes and BCL11B were transcribed, and whether HPV70‐BCL11B fusion transcripts were present. Therefore, RNAseq was performed on tumor RNA, and reads were aligned with the HPV70 segment‐containing BCL11B gene (Figure 3A). HPV70 transcripts derived almost entirely from a segment encompassing E6, E7, and E1 precisely up to the standard E1^E4 5′ splice site (5′ss; Figure 3B). E6 and E7 are normally translated from transcripts containing both open reading frames (ORFs),68 and about 40% of the transcripts in this tumor were consistent with excision of the standard HPV E6* intron (Figure 3B). However, only few of the transcripts included the AUG start codon for E6 (Figure 3C), indicating that only a fraction of transcripts could contain the full‐length E6 or E6* ORFs, and those that did had very short 5′ untranslated regions, implying that transcription initiated downstream of the standard early transcription start site used during early phase of the HPV replicative cycle, and suggesting that little if any E6 or E6* translation may have occurred in this tumor. In summary, the HPV RNAseq reads were consistent with most transcription initiating slightly downstream of where viral early transcription normally initiates during an HPV infection. Outside of the E6, E7, and E1 segment up to the E1^E4 5′ss, almost no HPV70 transcripts were detected (Figure 3B). E7, which is normally expressed from mRNAs containing both E6 and E7 ORFs, was the only viral ORF entirely covered by abundant transcripts. Moreover, the E6* splices that were detected must have occurred mostly in what is the 5′ UTR of E7 ORF‐containing mRNAs in this tumor. During infections, HPVs encode the E4 ORF from transcripts with the E6 and E7 ORFs upstream of the E1^E4 5′ss that normally fuses the 5′ end of the E1 ORF to that of E4. The abrupt absence of viral transcripts downstream of the standard E1^E4 5′ss strongly suggested efficient splicing at this junction, but to a nonviral downstream 3′ss in this tumor.
Figure 3
RNAseq analysis of HPV70 and BCL11B transcripts in the tumor. A, Genetic diagram of the BCL11B gene allele and the HPV70 DNA insert, with exons and HPV70 are drawn larger than scale of the full‐length BCL11B gene segment. The 5′ to 3′ transcriptional orientation of both BCL11B and HPV70 is from left to right. Unspliced primary transcripts inferred from the 150 nucleotide RNAseq reads are shown below the genomic DNA with introns depicted as dotted segments. Positions of key splicing components are labeled, specifically the E1^E4 5′ss, the BCL11B exon 4 3′ss, and the E6* intron. ORFs encoded by the spliced mRNAs that were supported by split reads are diagramed below the primary transcripts. The top two are HPV70‐BCL11B fusions, while the bottom three are BCL11B transcripts. HPV70 sequences are in blue, and BCL11B sequences are in green with variably spliced exon 3B in green outline and white center. The alternative ORF of exon 4 that is read from exon 3B‐containing transcripts (ORF 2) is shown in orange. B, RNAseq coverage of the HPV70 segment showing that virtually all reads located to the region encompassing E6, E7 and the portion of E1 up to the E1^E4 5′ss. A decreased number of reads was present over the E6* intron indicating removal of that intron in a fraction of the transcripts. Colored vertical bars are nucleotides that differ from the HPV70 reference sequence. C, Enlarged view of RNAseq reads near the start codon for HPV70 E6 gene ORF showing the limited number of reads near the ATG start codon. The E6 ORF from the ATG start codon is shown in red at the bottom. D, Sashimi plot showing split read support for splice junctions. The diagram below the plot shows the positions of HPV70 DNA (blue) and the BCL11B exons (green), and represents the custom‐generated genomic segment containing the HPV70 DNA insertion that was used to obtain the plot. E, Sashimi plot of BCL11B exon splice junctions obtained using the BCL11B genomic segment without the HPV70 DNA insertion demonstrating the inclusion of alternative transcripts with exon 3B in this tumor, including split reads supporting splicing from exon 2 to exon 3B. F, The amino acid sequence encoded by the HPV70 E1—BCL11B exon 4 fusion ORF. HPV70 E1 encoded amino acids are highlighted in blue. The six C2H2 zinc fingers within the exon 4 encoded segment are highlighted in yellow with the cysteine residues in purple and the histidines in red. Sixty‐six split reads supported the E1 to exon 4 splice junction, five of which are presented in Supporting Information Figure S2. G, The amino acid sequence encoded by the HPV70 E1 to BCL11B exon 3B and exon 4 fusion ORF. HPV70 E1 encoded amino acids are highlighted in blue. Exon 3B encoded amino acids are highlighted in gray. Exon 4 ORF2 encoded amino acids are shown in orange. The nucleotide sequence of exon 3B including intronic splicing signals along with the four split reads that support E1 to exon 3B splicing are shown in Supporting Information Figure S3. H, Amino acid sequence of BCL11B encoded by exons 1, 2, 3, and 4 is shown for comparison. Exon 1 sequences are highlighted in green. Exon 2 sequences are highlighted in purple. Exon 3 sequences are highlighted in blue. The six zinc fingers encoded in exon 4 are highlighted as in panel (F). I, Inferred amino acid sequence encoded by BCL11B mRNA with alternative exon 3B. Segment colors match panels (G) and (H). The two underlined amino acids (proline and glutamate) at the beginning of exon 2 in panels (H) and (I) show amino acids that are excluded by a known alternative splice junction between exons 1 and 2 of BCL11B
RNAseq analysis of HPV70 and BCL11B transcripts in the tumor. A, Genetic diagram of the BCL11B gene allele and the HPV70 DNA insert, with exons and HPV70 are drawn larger than scale of the full‐length BCL11B gene segment. The 5′ to 3′ transcriptional orientation of both BCL11B and HPV70 is from left to right. Unspliced primary transcripts inferred from the 150 nucleotide RNAseq reads are shown below the genomic DNA with introns depicted as dotted segments. Positions of key splicing components are labeled, specifically the E1^E4 5′ss, the BCL11B exon 4 3′ss, and the E6* intron. ORFs encoded by the spliced mRNAs that were supported by split reads are diagramed below the primary transcripts. The top two are HPV70‐BCL11B fusions, while the bottom three are BCL11B transcripts. HPV70 sequences are in blue, and BCL11B sequences are in green with variably spliced exon 3B in green outline and white center. The alternative ORF of exon 4 that is read from exon 3B‐containing transcripts (ORF 2) is shown in orange. B, RNAseq coverage of the HPV70 segment showing that virtually all reads located to the region encompassing E6, E7 and the portion of E1 up to the E1^E4 5′ss. A decreased number of reads was present over the E6* intron indicating removal of that intron in a fraction of the transcripts. Colored vertical bars are nucleotides that differ from the HPV70 reference sequence. C, Enlarged view of RNAseq reads near the start codon for HPV70 E6 gene ORF showing the limited number of reads near the ATG start codon. The E6 ORF from the ATG start codon is shown in red at the bottom. D, Sashimi plot showing split read support for splice junctions. The diagram below the plot shows the positions of HPV70 DNA (blue) and the BCL11B exons (green), and represents the custom‐generated genomic segment containing the HPV70 DNA insertion that was used to obtain the plot. E, Sashimi plot of BCL11B exon splice junctions obtained using the BCL11B genomic segment without the HPV70 DNA insertion demonstrating the inclusion of alternative transcripts with exon 3B in this tumor, including split reads supporting splicing from exon 2 to exon 3B. F, The amino acid sequence encoded by the HPV70 E1—BCL11B exon 4 fusion ORF. HPV70 E1 encoded amino acids are highlighted in blue. The six C2H2 zinc fingers within the exon 4 encoded segment are highlighted in yellow with the cysteine residues in purple and the histidines in red. Sixty‐six split reads supported the E1 to exon 4 splice junction, five of which are presented in Supporting Information Figure S2. G, The amino acid sequence encoded by the HPV70 E1 to BCL11B exon 3B and exon 4 fusion ORF. HPV70 E1 encoded amino acids are highlighted in blue. Exon 3B encoded amino acids are highlighted in gray. Exon 4 ORF2 encoded amino acids are shown in orange. The nucleotide sequence of exon 3B including intronic splicing signals along with the four split reads that support E1 to exon 3B splicing are shown in Supporting Information Figure S3. H, Amino acid sequence of BCL11B encoded by exons 1, 2, 3, and 4 is shown for comparison. Exon 1 sequences are highlighted in green. Exon 2 sequences are highlighted in purple. Exon 3 sequences are highlighted in blue. The six zinc fingers encoded in exon 4 are highlighted as in panel (F). I, Inferred amino acid sequence encoded by BCL11B mRNA with alternative exon 3B. Segment colors match panels (G) and (H). The two underlined amino acids (proline and glutamate) at the beginning of exon 2 in panels (H) and (I) show amino acids that are excluded by a known alternative splice junction between exons 1 and 2 of BCL11BSuch splice junctions from HPV70 to BCL11B (Figure 3D) were identified in RNAseq split reads (Supporting Information Figure S2). Most of these joined the viral E1^E4 5′ss to the 3′ss for BCL11B exon 4, showing that viral‐host fusion transcripts were synthesized in this tumor. In addition, about 4% of E1^E4 5′ss's were joined to a cryptic BCL11B exon, here termed exon 3B, between the viral insertion and exon 4 (Figure 3D). Cryptic exon 3B was spliced to exon 4 downstream, and both the 5′ and 3′ intronic sequences immediately flanking the cryptic exon had standard splice signals (Supporting Information Figure S3). Alignment of the RNAseq reads with a normal BCL11B allele sequence without HPV70 identified all the standard splice junctions of the host gene, including variable inclusion of exon 3 (Figure 3E), indicating expression of normal BCL11B transcripts in the tumor as well, possibly from normal BCL11B alleles. In addition, splicing of a variant BCL11B transcript was identified in which the cryptic exon 3B was present between exons 2 and 4 (Figure 3E).Transcripts with the most frequently detected splice junction (E1^E4 5′ss to 3′ss exon 4) encoded a fusion protein with the first five amino acids of HPV70 E1 fused to the portion of BCL11B encoding all six of its zinc fingers (Figure 3F). The E1^E4 splice to cryptic exon 3B joined the first five HPV70 E1 amino acids in frame to an ORF that completely spanned exon 3B, which in turn was spliced to exon 4 (Figure 3G). However, the splice junction from exon 3B to exon 4 caused exon 4 to be read in a different ORF than normal and encompassing 66 amino acids and no zinc fingers (Figure 3G). Similarly, while transcripts encompassing the standard BCL11B exons 1, 2, 3, and 4 encoded the standard six Zn‐finger protein (Figure 3H), splicing of BCL11B transcripts from exon 2 to exon 3B fused the ORFs in frame, but the junction from exon 3B to exon 4 caused the same frameshift as in the viral fusion transcripts (Figure 3I). In summary, the HPV70 DNA insertion resulted mainly in HPV70‐BCL11B fusion transcripts encoding an unusual, N‐terminally truncated form of the BCL11B oncogene with five HPV70 E1 amino acids at the N‐terminus (Figure 3F). In addition, less abundant transcripts containing cryptic exon 3B that causes exon 4 of BCL11B to be read in a different ORF were detected with either viral or BCL11B encoded amino acids at the N‐termini (Figure 3G,I).To verify that the HPV70‐BCL11B fusion RNAs were present, RT‐PCR was performed on tumor RNA using one primer in virus sequences and the other in exon 4 (Figure 4A). This confirmed the presence of the fusion transcripts (Figure 4B). In addition to RNA from the tumor, a formalin fixed, paraffin embedded tissue fragment was obtained from a histopathology specimen from the same patient archived 3 years before tumor resection. The patient was diagnosed with CIN3 at that time, but was lost to follow‐up. Both RNA and DNA were both prepared from the archived CIN3 sample. RT‐PCR detected the presence of the fusion transcripts at both the tumor and the CIN3 stage (Figure 4B). Veracities of the RT‐PCR products were confirmed by sequencing of excised bands from the tumor RT‐PCR. The more prominent, faster migrating band confirmed splicing from HPV70 E1 to BCL11B exon 4 (Supporting Information Figure S4). The faint upper band was also recovered from the tumor RT‐PCR and sequenced thus confirming the structure of HPV70 E1 spliced to exon 3B, and exon 3B spliced to exon 4 of BCL11B in the same RNA (Supporting Information Figure S4). Presence of the 3980 bp HPV70 DNA insertion into BCL11B was also confirmed by PCR testing for the HPV70 L1 junction in the CIN3 biopsy as well as in the tumor (Figure 4C). Thus, the HPV70 DNA insertion and fusion transcripts were present in the lesion by the time of CIN3 diagnosis as well as in the eventual tumor.
Figure 4
Detection of HPV70—BCL11B RNA splicing and DNA insertion junctions in the tumor as well as in a CIN3 sample collected 3 years earlier from the same patient. A, Schematic diagram of the HPV70 DNA insert in BCL11B and the inferred transcripts with HPV70 in blue and BCL11B in green. Positions of the primers used for the junction PCR and the reverse transcriptase PCR (RT‐PCR) are shown, along with the expected amplified products. B, RT‐PCR confirming the HPV70—exon 4 RNA splice junction in the tumor and detecting it in the CIN3 sample. Negative controls were H2O or PBMC RNA collected from a healthy donor. C, Junction PCR confirming the HPV70 DNA insertion in the tumor and detecting it in the CIN3 sample with comparable negative controls as in panel (B). Sequences of the RT‐PCR and junction PCR products are presented in Supporting Information Figure S4
Detection of HPV70—BCL11B RNA splicing and DNA insertion junctions in the tumor as well as in a CIN3 sample collected 3 years earlier from the same patient. A, Schematic diagram of the HPV70 DNA insert in BCL11B and the inferred transcripts with HPV70 in blue and BCL11B in green. Positions of the primers used for the junction PCR and the reverse transcriptase PCR (RT‐PCR) are shown, along with the expected amplified products. B, RT‐PCR confirming the HPV70—exon 4 RNA splice junction in the tumor and detecting it in the CIN3 sample. Negative controls were H2O or PBMC RNA collected from a healthy donor. C, Junction PCR confirming the HPV70 DNA insertion in the tumor and detecting it in the CIN3 sample with comparable negative controls as in panel (B). Sequences of the RT‐PCR and junction PCR products are presented in Supporting Information Figure S4
DISCUSSION
Application of the HC + NGS approach to a cervical tumor identified the presence of HPV70 DNA and discerned the precise fraction of the viral genome present in the tumor. It also revealed that the viral DNA was integrated into one allele of BCL11B in the human genome, a gene that has been previously implicated in human tumorigenesis.44, 45, 63, 64, 65 Additional molecular genetic analyses identified the viral transcripts in the tumor and ascertained that the HPV70 DNA insertion resulted in fusion transcripts between the virus and portions of the BCL11B gene, including a novel cryptic exon. The finding of the viral insertion and the HPV70‐BCL11B exon 4 fusion transcript in the CIN3 sample from the same patient indicated that these events had occurred by the intraepithelial precursor stage of disease and persisted during the invasive carcinoma stage. These results strongly imply that the HPV70 insertion and activation of BCL11B fusion ORF transcription were key events during malignant transformation. HPV70 was previously detected in ICCs and oral cancers,69, 70 and BCL11B was previously identified as a site of HPV16 integration in at least two cervical cancers.43, 62 Tumorigenesis by insertion of viral DNA near oncogenes is a thoroughly established oncogenic mechanism.31, 32, 33, 34, 35 The frequent detection of HPV DNA near oncogenes in virally induced human tumors indicates that insertional oncogenesis is a key mechanism in HPV‐induced tumors, and that it may act in concert with HPV‐encoded oncogenes, perhaps even replacing them in some instances.40 Detection of HPV70 DNA integrated into BCL11B and transcriptional activation of it, along with HPV E7 transcripts, strongly implicate HPV70 in this tumor.Hybridization capture plus massively parallel sequencing has been used and proven to be reliable for identifying human genome integration sites of HPV DNA in tumors.20, 21, 24, 46 As demonstrated here, this approach is best applied by use of capture DNA probes encompassing the complete viral genomes of multiple HPV types. It allowed identification of the 3980 bp integrated HPV70 segment present in the tumor that comprised the URR‐E6‐E7‐5′E1 region, indicating that this segment is the sole viral component responsible at least for maintaining the tumor state. Other viral components were required during at least the viral infection phase of tumorigenesis. HC + NGS also discerned the absence of the remainder of the viral genome in the tumor, including the 3′ portion of E1 and the entire E2 gene. Full‐length viral E1 and E2 ORFs are frequently disrupted in ICCs.47 Precise mapping of the junctions at both ends of the integrated viral DNA, which can be achieved by HC + NGS, is required for full understanding of the consequences of an HPV DNA insertion. This includes the 34 bp human genome deletion observed here, and the presence of microhomologies at both ends of the inserted HPV70 segment suggesting that viral DNA integration presumably occurred by microhomology mediated repair.20 While the use of multiple analyses here revealed additional molecular insights, HC + NGS was sufficient to allow the identification of the HPV type in the tumor, inference of the viral DNA components present along with their organization, the exact location of the viral DNA in the human genome, and the inference of a specific human oncogene involved (BCL11B).The WGS and SNP array analyses indicated that the HPV70 segment was integrated into one allele of BCL11B. Transcripts from the HPV70 segment encompassed the full‐length E7 ORF of the viral E7 protein. This suggests that E7, BCL11B, and the chromosome 3q amplification were all likely important genetic components that function together at least to maintain this tumor. Surprisingly, the RNAseq results revealed that only a small fraction of transcripts encompassed the start codon of the E6 ORF, with most initiating further downstream in the E6 ORF. These findings suggest that only low levels of E6 might be translated in this tumor. In addition to binding to TP53, E6 also interacts with a number of antiapoptotic proteins and PDZ domain‐containing proteins.58, 71, 72, 73, 74, 75, 76 It also activates the TERT promoter.77, 78, 79, 80 Perhaps TERT gene‐containing chromosome 3q amplification in part compensated for the low levels of E6. The HPV70‐BCL11B fusions that were detected in this tumor are strong candidates to function as dominant oncogenes in tumorigenesis. BCL11B encodes a Kruppel‐like transcription factor with six C2H2‐type zinc fingers. It has been implicated in both pediatric and adult T‐cell acute lymphoblastic leukemia, where a high incidence of the t(5;14) chromosomal translocation and an association with expression of homeobox transcription factor TLX3 have been noted.44, 45, 63, 64, 65 BCL11B has been reported to be a component of the BAF (BRG1/BRM‐associated factor) ATP‐dependent chromatin remodeling complex. Genes encoding the various components of BAF are mutated in at least 20% of humancancers making it an attractive target for novel antitumor therapies such as inhibitors of the bromodomains that occur in other BAF components.81, 82 Most of the fusion ORFs identified here (Figure 3) encode an N‐terminally truncated form of BCL11B that retains the six zinc finger domains. A small fraction encoded a form with a cryptic BCL11B exon that causes exon 4 to be read out of frame. It can be speculated that these altered forms of BCL11B perhaps have novel gain‐of‐function or dominant‐negative function that alters transcription in a manner that drives tumorigenesis.HPV tumorigenesis triggers genomic instability.21, 22 In addition to the chromosome 3q trisomy that recurrently occurs in ICC, copy number variation of multiple chromosome subregions was observed. Most striking was the BCL11B region of chromosome 14, with individual tumor cells displaying from two to over 50 copies. This observation stresses the perhaps surprising extent of genetic variability among individual tumor cells, and raises the question of what role it might play in tumorigenesis.Population screening for HPV is based on demanding sensitivity and specificity criteria that balance clinical benefit with increased morbidity risk, emotional duress, and monetary costs of additional follow‐up screening of individuals who test positive. While the limitations of screening tests for detection of HPV types that cause tumors of infrequent or unknown incidence have long been recognized, and broadly based HPV analyses of tumors have suggested roles for non‐hrHPVs in oncogenesis in a minority of tumors broader HPV type screening would create substantial burdens.1, 8, 83, 84, 85 Nonetheless, it is important to be aware that screening based solely on hrHPV detection may miss a small fraction of at risk individuals such as the patient described here, where the L1 gene PCR target may be deleted or may lack sufficient sequence homology for recognition. Expanded PCR specificity for HPV analysis is a subject of current interest.86HPV typing of ICCs currently offers little or no direct benefit to affected patients. The robust insights offered by the HC + NGS approach of simultaneous detection of a broad variety of HPV types, elucidation whether the viral DNA is integrated into the human genome, inference of the components of the viral genome that are present, and determination of whether the viral DNA is integrated adjacent to known human oncogenes suggest that it has the potential to provide deeper insights about individual tumors than simply assessing the presence of HPV DNA. One potential advantage of HC + NGS integration site analysis of humanCIN lesions and tumors is determining whether HPV DNA integration in precancerous lesions has implications for progression to invasive carcinomas, including whether CIN lesions having HPV DNA integrations near known oncogenes warrant more intensive follow‐up. A second is the identification of the full spectrum of human oncogenes where HPV DNAs are integrated in human tumors, perhaps in some instances providing actionable targets. A third is to provide a more complete global assessment of the frequencies of integrated DNA of HPV types currently considered to be of probable, possible, or low risk for tumorigenesis within tumors in various patient populations. Future studies could test if the mechanistic oncogenic insights gained from HC + NGS analyses have the ability to inform ongoing studies of disease management and perhaps lead to improved care.
AUTHOR CONTRIBUTIONS
A.V.A. performed and supervised the experiments, interpreted the results, contributed to writing the manuscript. N.E.P. and E.C.M. performed the experiments, contributed to writing the manuscript. L.A. and K.V.D. contributed intellectually to the experimental design, contributed to writing the manuscript. B.H. performed the histopathological analysis, contributed to writing the manuscript. N.N., D.Y.S.K., and M.H.E. contributed intellectually to the project design, contributed to writing the manuscript. J.L., and C.M. interpreted the results, supervised the entire project, contributed to writing the manuscript.Supplementary Figure 1 Sequence of the HPV70 reference genome is presented. *'s indicate the nucleotide was present in the viral DNA insertion in the tumor, and that the nucleotide was identical. Positions where the tumor nucleotides differed are highlighted in yellow and the change is shown. The gray highlighted segment is the URR between the L1 and E6 ORFs.Click here for additional data file.Supplementary Figure 2 Five of the split reads crossing the HPV70 E1^E4 5′ss to BCL11B exon 4 splice junction. HPV70 sequences are highlighted in blue. Human genome BCL11B sequences are highlighted in green.Click here for additional data file.Supplementary Figure 3 DNA sequence of BCL11B exon 3B (upper case letters, hg19 position chr14:99665867 to 99 665 710) with immediately flanking intron sequences (lower case letters). Upstream intron sequences include 3′ss consensus sequence (bold), t‐rich segment, and potential lariat site (ac). Downstream intron sequences include 5′ss consensus (bold).Click here for additional data file.Supplementary Figure 4 Sequences of RT‐PCR products shown in Figure 4B highlighting the HPV70 (blue), BCL11B exon 3B (gray), and BCL11B exon 4 (green) sequences of the PCR products generated from tumor RNA.Click here for additional data file.
Authors: Tobias Rausch; Thomas Zichner; Andreas Schlattl; Adrian M Stütz; Vladimir Benes; Jan O Korbel Journal: Bioinformatics Date: 2012-09-15 Impact factor: 6.937