| Literature DB >> 33810183 |
María Lourdes Garza-Rodríguez1, Mariel Araceli Oyervides-Muñoz2,3, Antonio Alí Pérez-Maya3, Celia Nohemí Sánchez-Domínguez3, Anais Berlanga-Garza4, Mauro Antonio-Macedo4, Lezmes Dionicio Valdés-Chapa4, Diego Vidal-Torres4, Oscar Vidal-Gutiérrez1, Diana Cristina Pérez-Ibave1, Víctor Treviño5.
Abstract
Human papillomavirus (HPV) DNA integration is a crucial event in cervical carcinogenesis. However, scarce studies have focused on studying HPV integration (HPVint) in early-stage cervical lesions. Using HPV capture followed by sequencing, we investigated HPVint in pre-tumor cervical lesions. Employing a novel pipeline, we analyzed reads containing direct evidence of the integration breakpoint. We observed multiple HPV infections in most of the samples (92%) with a median integration rate of 0.06% relative to HPV mapped reads corresponding to two or more sequence breakages. Unlike cancer studies, most integrations events were unique (supported by one read), consistent with the lack of clonal selection. Congruent to other studies, we found that breakpoints could occur, practically, in any part of the viral genome. We noted that L1 had a higher frequency of rupture integration (25%). Based on host genome integration frequencies, we found previously reported integration sites in cancer for genes like FHIT, CSMD1, and LRP1B and putatively many new ones such as those exemplified in CSMD3, ROBO2, and SETD3. Similar host integrations regions and genes were observed in diverse HPV types within many genes and even equivalent integration positions in different samples and HPV types. Interestingly, we noted an enrichment of integrations in most centromeres, suggesting a possible mechanism where HPV exploits this structural machinery to facilitate integration. Supported by previous findings, overall, our analysis provides novel information and insights about HPVint.Entities:
Keywords: HPV integration; cervical cancer; cervical lesions; hot spots
Year: 2021 PMID: 33810183 PMCID: PMC8005155 DOI: 10.3390/ijms22063242
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Scheme of the strategy used to identify reads showing evidence of HPVint.
Detected human papillomavirus (HPV types per sample.
| Sample | Smear | Reads | Mapped * | %Map ¶ | HPV Types + | Percent + |
|---|---|---|---|---|---|---|
| M-7440 | CC | 4,562,484 | 6474 | 0.1 | t45 | 100 |
| M-7443 | LSIL | 716,396 | 381,893 | 53.3 | 18, 74 | 99, 1 |
| M-7445 | LSIL | 15,287,436 | 8,496,237 | 55.6 | 68, NA440, NA448, t45 | 42, 36, 22, <1 |
| M-7447 | LSIL | 13,480,394 | 8,317,461 | 61.7 | 52, t52, 74, 87, 44, t45, t66 | 39, 38, 22, 1, <1, <1, <1 |
| M-7448 | ASCUS | 417,768 | 29,188 | 7.0 | 6, t45 | 98, 2 |
| M-7449 | Normal | 147,160 | 411 | 0.3 | t45 | 100 |
| M-7450 | ASCUS | 1,143,332 | 366,366 | 32.0 | 11, 52, t52, t56, 56, NA450 | 79, 8, 7, 2, 2, 1 |
| M-7452 | ASCUS | 17,740,624 | 11,991,175 | 67.6 | 59, 89, 53, t45 | 98, 1, 1, <1 |
| M-7454 | LSIL | 5,251,310 | 3,219,573 | 61.3 | 30, 51, 42, 11, t45 | 53, 38, 7, 2, <1 |
| M-7455 | NA | 6,350,088 | 2,897,763 | 45.6 | 16, 35, NA446, t45 | 85, 8, 7, <1 |
| M-7456 | HSIL | 3,879,422 | 8527 | 0.2 | t45, 16 | 86, 14 |
| M-7457 | LSIL | 367,094 | 13,371 | 3.6 | 6, t45, 51, 90 | 47, 44, 6, 3 |
| M-7458 | LSIL | 272,030 | 18,674 | 6.9 | 6, 86, t45 | 92, 5, 3 |
| M-7459 | HSIL | 28,572,654 | 21,901,105 | 76.7 | 31, t45 | 100, <1 |
| M-7460 | LSIL | 494,080 | 56,676 | 11.5 | 51, 70, t45, 34 | 74, 22, 2, 2 |
| M-7461 | LSIL | 365,174 | 2367 | 0.6 | 44, t45, 54, 53 | 49, 27, 13, 11 |
| M-7462 | LSIL | 884,470 | 238,060 | 26.9 | 51, 54 | 96, 4 |
| M-7463 | LSIL | 702,672 | 121,121 | 17.2 | 16, NA446, t45, 40 | 90, 8, 1, 1 |
| M-7464 | LSIL | 481,510 | 40,073 | 8.3 | 11, 31, 34, t45 | 61, 30, 5, 3 |
| M-7465 | NA | 9,826,324 | 6,183,616 | 62.9 | t66, NA451, 66, t45 | 49, 31, 19, <1 |
| M-7467 | Normal | 3,173,490 | 1,445,628 | 45.6 | 11, t45 | 100, <1 |
| M-7470 | LSIL | 13,217,166 | 7,266,920 | 55.0 | 53, 33, t33, 52, t52, NA436, t45 | 39, 16, 13, 11, 11, 10, <1 |
| M-7471 | Normal | 9,791,996 | 5,128,463 | 52.4 | 39, 82, t39, NA447, 73, 42, 66, NA449, 52, t52, t45 | 33, 27, 16, 10, 4, 3, 2, 2, 1, 1, <1 |
| M-7472 | LSIL | 6,241,940 | 3,953,761 | 63.3 | 51, t45 | 100, <1 |
* Reads matching 85% or more. ¶ %Map: Percentage of total reads that were mapped. + HPV types sorted by corresponding percentage of reads. Percent show the corresponding fraction of reads identified in each HPV type.
Detected HPV types in two or more samples.
| Type | Samples | Reads | Within Sample % ¶ |
|---|---|---|---|
| t45 | M-7440, M-7445, M-7447, M-7448, M-7449, | 177,864 | 100, <1, <1, 2, 100, |
| 51 | M-7454, M-7457, M-7460, M-7462, M-7472 | 5,433,163 | 38, 44, 74, 96, 100 |
| 52 | M-7447, M-7450, M-7470, M-7471 | 4,197,311 | 39, 8, 11, 1 |
| t52 | M-7447, M-7450, M-7470, M-7471 | 4,018,079 | 38, 7, 11, 1 |
| 11 | M-7450, M-7454, M-7464, M-7467 | 1,815,076 | 79, 2, 61, 100 |
| 53 | M-7452, M-7461, M-7470 | 2,901,407 | 1, 11, 39 |
| 16 | M-7455, M-7456, M-7463 | 2,561,367 | 85, 14, 90 |
| 6 | M-7448, M-7457, M-7458 | 45,980 | 98, 3, 92 |
| 31 | M-7459, M-7464 | 21,889,052 | 100, 30 |
| t66 | M-7447, M-7465 | 3,058,716 | <1, 49 |
| 74 | M-7443, M-7447 | 1,795,576 | 1, 22 |
| 66 | M-7465, M-7471 | 1,298,463 | 19, 2 |
| 42 | M-7454, M-7471 | 377,645 | 7, 3 |
| NA446 | M-7455, M-7463 | 219,032 | 7, 8 |
| 44 | M-7447, M-7461 | 29,574 | <1, 49 |
| 54 | M-7461, M-7462 | 9174 | 13, 4 |
| 34 | M-7460, M-7464 | 2977 | 2, 5 |
¶ Percent show the corresponding fraction of reads identified in corresponding samples.
Figure 2Comparison of mapped reads in different samples of 4 selected HPV types. Each panel shows the number of mapped reads (in vertical) of 5, 5, 4, or 3 samples detected in corresponding HPV types (NCBI accession number is shown). The vertical axis marked with “*” in HPV16 is shown in logarithm scale to clarify marked nucleotide variant sites. The peak number of reads per sample is shown at the left. The sample is shown at the right. Vertical lines in colors indicate sequence differences. Triangle marks show distinctive or absent variant sites.
Figure 3Examples of HPVint. (A) Two integrations at different known cancer genes (RAD51B and MACROD2). (B) Two detected integrations at the same position of the RAB32 gene of reads showing different lengths. Note the region rich in GT. Arrows denote marked reads within detected tandem repeats regions. Nucleotides in lowercase denote mismatches relative to the target sequence; if underlined, refer to insertion relative to the target. The vertical axis at the left refers to integrations, whereas the right axis refers to overall HPV mapping.
Detected integrations per sample and HPV type.
| Sample | % Integration | Integrations | After | % | Types ¶ | Percents ¶ |
|---|---|---|---|---|---|---|
| M-7440 | 1.71% | 111 | 39 | 0.60% | t45 | 100 |
| M-7443 | 0.04% | 164 | 161 | 0.04% | 18 | 100 |
| M-7445 | 0.02% | 1508 | 1491 | 0.02% | 68, NA440, t45, NA448 | 96, 3.5, <1, <1 |
| M-7447 | 0.05% | 4554 | 4507 | 0.05% | 52, 74, t52, 87, 44, t45, t66 | 77, 19, 3, <1, <1, <1, <1 |
| M-7448 | 0.10% | 28 | 26 | 0.09% | 6 | 100 |
| M-7449 | 0.73% | 3 | 2 | 0.49% | t45 | 100 |
| M-7450 | 0.04% | 154 | 148 | 0.04% | 11, 52, 56, t52, NA450 | 70, 21, 7, <1, <1 |
| M-7452 | 0.07% | 8249 | 7803 | 0.07% | 59, 53, 89, t45 | 99, <1, <1, <1 |
| M-7454 | 0.07% | 2337 | 2292 | 0.07% | 30, 51, 42, 11, t45 | 54, 39, 6, 1, <1 |
| M-7455 | 0.06% | 1790 | 1769 | 0.06% | 16, 35, NA446, t45 | 91, 7, 2, <1 |
| M-7456 | 1.52% | 130 | 44 | 0.52% | t45 | 100 |
| M-7457 | 0.04% | 5 | 3 | 0.02% | 90, 51 | 67, 33 |
| M-7458 | 0.06% | 11 | 7 | 0.04% | 6, t45, 86 | 71, 14, 14 |
| M-7459 | 0.08% | 18,080 | 17,476 | 0.08% | 31, t45 | 100, <1 |
| M-7460 | 0.07% | 38 | 34 | 0.06% | 51, 70 | 74, 26 |
| M-7461 | 0.17% | 4 | 0 | 0.00% | - | - |
| M-7462 | 0.06% | 137 | 129 | 0.05% | 51, 54 | 98, 2 |
| M-7463 | 0.10% | 118 | 115 | 0.09% | 16, 40, NA446 | 92, 6, 2 |
| M-7464 | 0.05% | 20 | 18 | 0.04% | 11, 31 | 56, 34 |
| M-7465 | 0.04% | 2573 | 2539 | 0.04% | 66, NA451, t66, t45 | 53, 36, 11, <1 |
| M-7467 | 0.05% | 737 | 691 | 0.05% | 11, t45 | 99, <1 |
| M-7470 | 0.08% | 5952 | 5805 | 0.08% | 33, 53, 52, NA436, t52, t45, t33 | 39, 39, 20, 1, <1, <1, <1 |
| M-7471 | 0.03% | 1754 | 1695 | 0.03% | 39, 82, 73, 66, 42, 52, NA447, t39, t52, t45, NA449 | 67, 11, 10, 3, 3, 3, 1, <1, <1, <1, <1 |
| M-7472 | 0.12% | 4647 | 4418 | 0.11% | 51, t45 | 100, <1 |
¶ HPV types sorted by corresponding percentage of reads. Percent show the corresponding fraction of reads identified in each HPV type.
Figure 4Comparison of mapped and putatively integrated reads. The vertical axis shows the number of reads. The horizontal axis represents the HPV genome coordinates from 0 to 8000 bp approximately. * Scale for M-7452 in integrated reads is reduced to half to clarify correlation (3 peaks are cut).
Figure 5Multiple reads or integrations. (A) Reads in genes showing more than 1 read in the same integration point in 2 or more samples (targets in human region and HPV are the same). (B) Reads or integrations in coding genes found in 4 or more samples where HPV target type or precise human integration point may differ. Gene region includes 5 kb in 5′ and 3′ of the canonical transcript. Genes marked with a star “*” do not show records of co-occurrence with HPV in abstracts from PubMed. (C) Reads or integrations in non-coding regions close to genes in 3 or more samples where HPV target type or precise human integration point may differ. Regions marked with “*” and “c” are close to centromeres. Pap smear results per samples are shown.
Figure 6Examples of multiple integrations in known and novel genes. In each gene, a colored vertical segment represents a read. Different segment colors represent different samples. A specific region in the SETD3 gene is detailed. Samples and corresponding integrations for HPV types are Scheme 5.
Figure 7Overall map of 50,732 pooled read-integrations in the human genome. The top panel Scheme 20. bins). The vertical axis is cut to 50 to highlight differences. The numbers mark positions with higher than 50 counts. The bottom panel shows the integrations (dots) in all chromosomes distributed by blast e-value to show the lack of mapping bias. Centromeres coordinates (hg19) are marked with a black spot above. Note higher counts close to centromeres.