| Literature DB >> 26569308 |
Janet M Doolittle-Hall1, Danielle L Cunningham Glasspoole2, William T Seaman3, Jennifer Webster-Cyriaque4,5,6.
Abstract
Oncoviruses cause tremendous global cancer burden. For several DNA tumor viruses, human genome integration is consistently associated with cancer development. However, genomic features associated with tumor viral integration are poorly understood. We sought to define genomic determinants for 1897 loci prone to hosting human papillomavirus (HPV), hepatitis B virus (HBV) or Merkel cell polyomavirus (MCPyV). These were compared to HIV, whose enzyme-mediated integration is well understood. A comprehensive catalog of integration sites was constructed from the literature and experimentally-determined HPV integration sites. Features were scored in eight categories (genes, expression, open chromatin, histone modifications, methylation, protein binding, chromatin segmentation and repeats) and compared to random loci. Random forest models determined loci classification and feature selection. HPV and HBV integrants were not fragile site associated. MCPyV preferred integration near sensory perception genes. Unique signatures of integration-associated predictive genomic features were detected. Importantly, repeats, actively-transcribed regions and histone modifications were common tumor viral integration signatures.Entities:
Keywords: HBV; HIV; HPV; MCPyV; viral integration
Year: 2015 PMID: 26569308 PMCID: PMC4695887 DOI: 10.3390/cancers7040887
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.639
Figure 1Location of viral integration sites in the human genome. Human chromosomes (1–22, X, Y) are arranged around the circle. The inner-most ring shows viral integration sites, stacking multiple events that occurred at the same location. (a) HPV integration sites; (b) HBV integration sites; (c) MCPyV integration sites; (d) HIV integration sites.
Figure 2GO biological process term enrichment of genes near viral integration sites. GO terms that were significant after the Fisher exact test with Bonferroni multiple testing correction (p < 0.05) are shown. For HIV, only the terms with the 20 lowest p-values are shown.
Figure 3Genomic features near integration sites. (a) Categories of genomic features in the context of chromatin; (b) windows of four sizes are defined around viral integration sites, and features present in the human genome within each window are scored. Integration sites may be precisely mapped or be broader regions.
Summary of genomic features. All genomic feature scores were normalized for the length of the search region. * From ENCODE.
| Category | Gene Presence | Gene Expression | Open Chromatin | Histone Modifications | DNA Methylation | TF and Other Protein Binding | Chromatin Segmentation | Repeats |
|---|---|---|---|---|---|---|---|---|
| Data Type | GENCODE, COSMIC Cancer Gene Census | RNA-seq | DNase-seq, FAIRE-seq | ChIP-seq | Methyl-RRBS | ChIP-seq | Hoffman | UCSC repeat masker |
| Data Source | hg19 | HeLa *, SiHa, NHEK *, HepG2 *, GM12878 * | HeLa *, NHEK *, HepG2 *, GM12878 * | HeLa *, NHEK *, HepG2 *, GM12878 * | HeLa *, HepG2 *, GM12878 * | HeLa *, NHEK *, HepG2 *, GM12878 * | HeLa *, HepG2 *, GM12878 * | hg19 |
| Scoring Method | Number of genes | Sum of RPKM | Number of peaks | Number of peaks | Percent Methylated | Number of peaks | Length of segment | Length of repeat |
| Number of Features | 2 | 5 | 8 | 44 | 3 | 178 | 21 | 16 |
Figure 4Significant differences were detected between viral integration sites and random sites. (a) HPV; (b) HBV; (c) HIV. Significance was determined using a two-sided Mann–Whitney U-test with Bonferroni correction, α < 0.05. Comparisons using the gene constraint set are indicated with GC. No significant differences were found for MCPyV. Only the features from the most relevant cell lines were considered for each virus.
Figure 5Predictive genomic features for each DNA tumor virus. Random forest models were developed for each virus and window size, using either the background set or the gene constraint set as the negative class. Starting from only the genomic features that were considered relevant to each virus, feature elimination was used to select the smallest set of features that gave an ROC within 2% of the best model using three-fold cross-validation repeated 10 times on the training set. The optimal model was then used to classify a held-out test set (75% of data for training, 25% for testing). The entire process was repeated 10 times, once for each of the randomly-selected background sets. The number of times each feature was selected for inclusion in the optimal model is shown (white: zero, black: 10). Only features selected at least five times for at least one window size are shown. (a) Features predictive of HPV integration. (b) ChIP-qPCR of two histone marks predictive of HPV integration, H3K36me3 and H3K4me3. The cartoon shows the locations of primers designed to tile across an approximately ±500-bp window around the two identical HPV-16 integrants at 13q22.1 in SiHa cells. The graph shows the mean and standard deviation of two replicates of qPCR, and a representative gel of the products is shown at the right. All primer pairs produced bands at the expected sizes, but 5′-300 showed additional bands (the arrow indicates the expected size). qPCR quantification showed high fold enrichment for 5′-300, some of which may be due to non-specific amplification. However, a band is clearly present at the expected size (arrow), suggesting the presence of H3K36me3 and H3K4me3 near the integration site. Satellite region 2 (SAT2) and total H3 were used as positive controls. (c) Features predictive of HPV integration. (d) Features predictive of HPV integration. Comparisons using the gene constraint set are indicated with GC.
Figure 6Significant differences were detected between types of viral integration sites. (a) Certain features in 7/8 categories were significantly different near HPV-18 integrations compared to HPV-16 (HPV-16 n = 382, HPV-18 n = 133); and (b) integrations in cervical tissue compared to those in head and neck cancers (HNC) (cervical n = 431, HNC n = 59). Regardless of window size or whether or not the number of genes was controlled for, gene expression, repeats and certain transcription factors differed significantly between HPV types (a) and tissues (b). (c) Significant differences between cervical cancer (n = 419) and W12 cell line (n = 28) integration sites. (d) Significant differences between HBV integration sites in HCC (n = 628) and tumor-adjacent tissues (n = 618). Significance was determined using a two-sided Mann–Whitney U-test with Bonferroni correction, α < 0.05. Comparisons using the gene constraint set are indicated with GC.