| Literature DB >> 35676699 |
Lei Jia1,2, Mengying Liu3, Caiqin Yang1,2, Hanping Li1,2, Yongjian Liu1,2, Jingwan Han1,2, Xiuli Zhai1,2, Xiaolin Wang1,2, Tianyi Li1,2, Jingyun Li1,2, Bohan Zhang1,2, Changyuan Yu4, Lin Li5,6.
Abstract
BACKGROUND: Human endogenous retroviruses (HERVs) result from ancestral infections caused by exogenous retroviruses that became incorporated into the germline DNA and evolutionarily fixed in the human genome. HERVs can be transmitted vertically in a Mendelian fashion and be stably maintained in the human genome, of which they are estimated to comprise approximately 8%. HERV-K (HML1-10) transcription has been confirmed to be associated with a variety of diseases, such as breast cancer, lung cancer, prostate cancer, melanoma, rheumatoid arthritis, and amyotrophic lateral sclerosis. However, the poor characterization of HML-9 prevents a detailed understanding of the regulation of the expression of this family in humans and its impact on the host genome. In light of this, a precise and updated HERV-K HML-9 genomic map is urgently needed to better evaluate the role of these elements in human health.Entities:
Keywords: BLAT; GRCh38/hg38; Gene regulation; HML-9; Human endogenous retrovirus
Mesh:
Year: 2022 PMID: 35676699 PMCID: PMC9178832 DOI: 10.1186/s12977-022-00596-2
Source DB: PubMed Journal: Retrovirology ISSN: 1742-4690 Impact factor: 3.768
HML-9 provirus distribution
| Number | Locus | Chromosome | Strand | Position start | Position end | Length (bp) | Match + mismatch (bp)/full length (bp) (%) | Range (%) | Qgap (bp)/match + mismatch + Qgap (bp) (%) | Insertion or deletion | Intergenic/intron/exon | Gene including the region |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 16p12.3 | chr16 | − | 19393581 | 19402152 | 8572 | 96.00 | (90–100) | 1.01 | NA | Exon_intron | AC130456.2 |
| 2 | 2p12 | chr2 | + | 82022660 | 82031279 | 8620 | 95.91 | (90–100) | 1.13 | NA | Intergenic | NA |
| 3 | 15q21.1 | chr15 | − | 45234477 | 45243073 | 8597 | 95.34 | (90–100) | 1.85 | NA | Exon_intron | AC051619.4 |
| 4 | 8p11.1 | chr8 | − | 43694016 | 43702583 | 8568 | 95.10 | (90–100) | 2.14 | NA | Intergenic | NA |
| 5 | 13q31.1 | chr13 | + | 84869526 | 84877320 | 7795 | 86.84 | (80–90) | 6.67 | NA | Exon_intron | AL445588.1 |
| 6 | 4q33 | chr4 | − | 170126345 | 170133883 | 7539 | 70.03 | (70–80) | 0.79 | Insertion | Intergenic | NA |
| 7 | 6p12.3 | chr6 | + | 48873675 | 48879725 | 6051 | 64.84 | (60–70) | 34.48 | Deletion | Intergenic | NA |
| 8 | Yp11.2 | chrY | − | 9273707 | 9279611 | 5905 | 59.83 | (50–60) | 39.23 | Deletion | Intergenic | NA |
| 9 | 8q24.3 | chr8 | + | 145019974 | 145032719 | 12,746 | 57.06 | (50–60) | 0.79 | Insertion | Intergenic | NA |
| 10 | Yq11.223 | chrY | + | 21580120 | 21585551 | 5432 | 57.04 | (50–60) | 38.30 | Deletion | Exon_intron | TTTY13 |
| 11 | 19q13.2 | chr19 | + | 40954172 | 40959178 | 5007 | 56.66 | (50–60) | 41.93 | Deletion | Intergenic | NA |
| 12 | Yp11.2 | chrY | − | 8121821 | 8126768 | 4948 | 54.57 | (50–60) | 44.92 | Deletion | Intergenic | NA |
| 13 | Yp11.2 | chrY | + | 8996062 | 9000755 | 4694 | 50.80 | (50–60) | 41.95 | Deletion | Intergenic | NA |
| 14 | Yq11.222 | chrY | − | 18622534 | 18626952 | 4419 | 47.33 | (40–50) | 52.23 | Deletion | Intergenic | NA |
| 15 | Yq11.223 | chrY | − | 21845475 | 21850069 | 4595 | 43.18 | (40–50) | 49.78 | Deletion/insertion | Exonic_intergenic | AC024236.1 |
| 16 | 21q21.1 | chr21 | − | 18563368 | 18566735 | 3368 | 34.26 | (30–40) | 8.19 | NA | Exon_intron | MIR548XHG |
| 17 | 5q33.3 | chr5 | − | 156660448 | 156663815 | 3368 | 34.14 | (30–40) | 8.50 | NA | Intron | SGCD |
| 18 | 1q22 | chr1 | − | 155629408 | 155632775 | 3368 | 33.75 | (30–40) | 9.56 | NA | Intron | AL353807.5 |
| 19 | 7q36.1 | chr7 | − | 150561277 | 150563994 | 2718 | 27.92 | (20–30) | 10.27 | Deletion | Intergenic | NA |
| 20 | 8q21.13 | chr8 | + | 78652302 | 78654820 | 2519 | 26.60 | (20–30) | 0.30 | NA | Intron | AC068700.2 |
| 21 | 10q24.2 | chr10 | − | 99822511 | 99825532 | 3022 | 25.36 | (20–30) | 24.65 | Deletion/insertion | Intron | ABCC2 |
| 22 | 12q13.11 | chr12 | + | 48509228 | 48511681 | 2454 | 18.44 | (10–20) | 33.18 | Deletion/insertion | Intergenic | NA |
| 23 | Yq11.222 | chrY | − | 17669948 | 17671523 | 1576 | 17.11 | (10–20) | 12.69 | Deletion | Intergenic | NA |
Fig. 1Chromosomal distribution of HML-9 loci. A All HML-9 elements (red arrows) are displayed on the human karyotype (www.ensembl.org). The number of HML-9 proviral elements (B) and solo LTRs (C) integrated into each human chromosome was determined and compared to the expected number of insertion events. The expected number of sequences in each chromosome is marked in blue, and the actual number of sequences detected is marked in orange
HML-9 solo LTR tracks distribution
| Number | Locus | Chromosome | Strand | Position start | Position end | Length (bp) | Percentage of LTR14C in length (%) | Match + mismatch/full length (%) | Range (%) | Qgap (bp)/match + mismatch + Qgap (bp) (%) | Insertion or deletion | Intergenic/intron/exon | Gene including the region |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 14q21.1 | chr14 | + | 38011040 | 38012012 | 973 | 101.36 | 6.91 | (0–10) | 35.61 | Deletion, insertion | Intron | TTC6 |
| 2 | Xq21.32 | chrX | − | 93273183 | 93274197 | 1015 | 100.85 | 6.88 | (0–10) | 2.47 | Insertion | Intergenic | NA |
| 3 | 2q31.1 | chr2 | + | 180236847 | 180237437 | 591 | 100.34 | 6.84 | (0–10) | 0.34 | NA | Intergenic | NA |
| 4 | 18p11.31 | chr18 | – | 4527618 | 4528209 | 592 | 100.17 | 6.83 | (0–10) | 0.00 | NA | Intergenic | NA |
| 5 | 2q11.2 | chr2 | + | 97964920 | 97965508 | 589 | 100.00 | 6.82 | (0–10) | 0.00 | NA | Intron | TMEM131 |
| 6 | 15q14 | chr15 | + | 39011033 | 39011621 | 589 | 100.00 | 6.82 | (0–10) | 0.00 | NA | Intron | LINC02694 |
| 7 | 2p12 | chr2 | – | 81304430 | 81305068 | 639 | 99.83 | 6.81 | (0–10) | 0.00 | NA | Intergenic | NA |
| 8 | 2q32.3 | chr2 | – | 194256159 | 194256746 | 588 | 99.83 | 6.81 | (0–10) | 0.00 | NA | Intergenic | NA |
| 9 | 3q26.1 | chr3 | – | 163283189 | 163283777 | 589 | 99.83 | 6.81 | (0–10) | 0.00 | NA | Intron | LINC01192 |
| 10 | 4q26 | chr4 | + | 116980222 | 116980809 | 588 | 99.83 | 6.81 | (0–10) | 0.34 | NA | Intergenic | NA |
| 11 | 4p15.31 | chr4 | + | 19556097 | 19556684 | 588 | 99.83 | 6.81 | (0–10) | 0.17 | NA | Intron | AC024230.1 |
| 12 | 7p21.2 | chr7 | + | 14509240 | 14509827 | 588 | 99.83 | 6.81 | (0–10) | 0.00 | NA | Intron | DGKB |
| 13 | 8q11.21 | chr8 | – | 51178592 | 51179179 | 588 | 99.83 | 6.81 | (0–10) | 0.00 | NA | Intergenic | NA |
| 14 | 11q12.3 | chr11 | – | 62185237 | 62185824 | 588 | 99.83 | 6.81 | (0–10) | 0.00 | NA | Intergenic | NA |
| 15 | 2q21.3 | chr2 | + | 135521883 | 135522470 | 588 | 99.66 | 6.80 | (0–10) | 0.17 | NA | Intron | ZRANB3 |
| 16 | 3p12.2 | chr3 | – | 81329902 | 81330488 | 587 | 99.66 | 6.80 | (0–10) | 0.17 | NA | Intergenic | NA |
| 17 | 3p12.1 | chr3 | – | 83618409 | 83618995 | 587 | 99.66 | 6.80 | (0–10) | 0.17 | NA | Intergenic | NA |
| 18 | 5q21.3 | chr5 | – | 105998962 | 105999549 | 588 | 99.66 | 6.80 | (0–10) | 0.34 | NA | Intergenic | NA |
| 19 | 10p12.31 | chr10 | – | 18856645 | 18857233 | 589 | 99.66 | 6.80 | (0–10) | 0.17 | NA | Intergenic | NA |
| 20 | Yq11.23 | chrY | – | 25974734 | 25975320 | 587 | 99.66 | 6.80 | (0–10) | 0.17 | NA | Intergenic | NA |
| 21 | 6q14.1 | chr6 | – | 82297755 | 82298498 | 744 | 99.49 | 6.78 | (0–10) | 0.34 | Insertion | Intergenic | NA |
| 22 | Xp22.2 | chrX | + | 11033746 | 11034330 | 585 | 99.32 | 6.77 | (0–10) | 0.51 | NA | Intron | AC073529.1 |
| 23 | 2p12 | chr2 | + | 77807602 | 77808185 | 584 | 99.15 | 6.76 | (0–10) | 0.68 | NA | Intron | AC012494.1 |
| 24 | 1q23.3 | chr1 | + | 162419359 | 162419942 | 584 | 98.98 | 6.75 | (0–10) | 0.17 | NA | Intergenic | NA |
| 25 | Xq27.2 | chrX | – | 142767872 | 142768454 | 583 | 98.98 | 6.75 | (0–10) | 0.00 | NA | Intergenic | NA |
| 26 | 2q31.1 | chr2 | + | 171365032 | 171365617 | 586 | 98.64 | 6.73 | (0–10) | 1.19 | NA | Intron | METTL8 |
| 27 | 5q13.3 | chr5 | + | 75859521 | 75860102 | 582 | 98.64 | 6.73 | (0–10) | 0.17 | NA | Intergenic | NA |
| 28 | Xq27.3 | chrX | – | 144791258 | 144791846 | 589 | 98.47 | 6.71 | (0–10) | 1.37 | NA | Intergenic | NA |
| 29 | 12q12 | chr12 | + | 38144469 | 38145052 | 584 | 98.30 | 6.70 | (0–10) | 1.03 | NA | Intergenic | NA |
| 30 | 15q21.3 | chr15 | – | 54594796 | 54595373 | 578 | 98.13 | 6.69 | (0–10) | 2.04 | NA | Intron | UNC13C |
| 31 | 21q11.2 | chr21 | – | 14080466 | 14081052 | 587 | 97.79 | 6.67 | (0–10) | 2.05 | NA | Intron | AP001347.1 |
| 32 | 4q28.2 | chr4 | + | 129080872 | 129081454 | 583 | 97.61 | 6.66 | (0–10) | 2.22 | NA | Intron | SCLT1 |
| 33 | 3q25.2 | chr3 | – | 154944330 | 154944911 | 582 | 97.44 | 6.64 | (0–10) | 2.56 | NA | Intergenic | NA |
| 34 | 11q24.2 | chr11 | + | 124270705 | 124271275 | 571 | 96.93 | 6.61 | (0–10) | 0.00 | NA | Intergenic | NA |
| 35 | 2q14.3 | chr2 | – | 125024208 | 125024792 | 585 | 96.76 | 6.60 | (0–10) | 3.07 | NA | Intergenic | NA |
| 36 | 6q27 | chr6 | + | 169084226 | 169084808 | 583 | 96.76 | 6.60 | (0–10) | 3.07 | NA | Intergenic | NA |
| 37 | 13q13.3 | chr13 | – | 38319721 | 38320300 | 580 | 96.76 | 6.60 | (0–10) | 3.24 | NA | Intron | LINC00571 |
| 38 | 7q35 | chr7 | + | 143472173 | 143472744 | 572 | 96.08 | 6.55 | (0–10) | 3.75 | NA | Intron | EPHA1-AS1 |
| 39 | 14q21.3 | chr14 | + | 48011215 | 48011780 | 566 | 95.91 | 6.54 | (0–10) | 0.53 | NA | Intergenic | NA |
| 40 | 3p21.31 | chr3 | + | 44534488 | 44535059 | 572 | 95.74 | 6.53 | (0–10) | 3.77 | NA | Intergenic | NA |
| 41 | 2q22.1 | chr2 | – | 138860917 | 138861512 | 596 | 95.06 | 6.48 | (0–10) | 3.79 | NA | Intergenic | NA |
| 42 | 12p13.32 | chr12 | – | 4720007 | 4720593 | 587 | 94.89 | 6.47 | (0–10) | 4.95 | NA | Intron | AC005833.1 |
| 43 | 3p14.2 | chr3 | – | 59469489 | 59470030 | 542 | 91.82 | 6.26 | (0–10) | 3.23 | NA | Intron | AC126121.3 |
| 44 | 1q24.2 | chr1 | + | 168457190 | 168457732 | 543 | 90.80 | 6.19 | (0–10) | 0.37 | NA | Intron | AL023755.1 |
| 45 | 20p13 | chr20 | – | 2809052 | 2809886 | 835 | 88.42 | 6.03 | (0–10) | 0.38 | Insertion | Intergenic | NA |
| 46 | 18q21.33 | chr18 | + | 63648105 | 63648555 | 451 | 76.49 | 5.22 | (0–10) | 0.22 | NA | Intron | SERPINB11 |
| 47 | 17q22 | chr17 | + | 52961655 | 52962071 | 417 | 70.87 | 4.83 | (0–10) | 0.24 | NA | Intergenic | NA |
Fig. 2HML-9 provirus structural characterization. Each HML-9 provirus element was analyzed and compared to the Dfam reference sequence. The LTRs and the gag, pro, pol, and env genes were annotated. Black lines represent deleted regions
The integrity of 6 separate regions relative to the corresponding sections of reference
| Number | Locus | Provirus regions | 5′LTR (%) | gag (%) | pro (%) | pol (%) | env (%) | 3′LTR (%) |
|---|---|---|---|---|---|---|---|---|
| 1 | 16p12.3 | chr16 19393581 19402152 | 100.00 | 99.83 | 99.89 | 99.39 | 99.17 | 99.66 |
| 2 | 2p12 | chr2 82022660 82031279 | 98.98 | 99.72 | 99.89 | 99.43 | 99.90 | 99.15 |
| 3 | 15q21.1 | chr15 45234477 45243073 | 99.83 | 99.27 | 100.00 | 99.66 | 99.56 | 99.83 |
| 4 | 8p11.1 | chr8 43694016 43702583 | 99.83 | 99.44 | 98.31 | 99.39 | 99.66 | 99.83 |
| 5 | 13q31.1 | chr13 84869526 84877320 | 35.78 | 99.55 | 52.20 | 99.77 | 99.80 | 99.32 |
| 6 | 4q33 | chr4 170126345 170133883 | 99.66 | 99.89 | 99.77 | 99.70 | 13.93 | 0.00 |
| 7 | 6p12.3 | chr6 48873675 48879725 | 99.15 | 92.79 | 65.84 | 6.13 | 99.85 | 99.66 |
| 8 | Yp11.2 | chrY 9273707 9279611 | 88.42 | 90.73 | 63.81 | 6.36 | 98.83 | 99.49 |
| 9 | 8q24.3 | chr8 145019974 145032719 | 0.00 | 0.00 | 0.79 | 99.77 | 99.90 | 95.23 |
| 10 | Yq11.223 | chrY 21580120 21585551 | 88.25 | 91.74 | 64.83 | 6.25 | 99.36 | 15.84 |
| 11 | 19q13.2 | chr19 40954172 40959178 | 98.98 | 98.94 | 64.26 | 6.17 | 67.40 | 77.51 |
| 12 | Yp11.2 | chrY 8121821 8126768 | 99.66 | 76.49 | 14.09 | 6.40 | 99.17 | 98.81 |
| 13 | Yp11.2 | chrY 8996062 9000755 | 0.00 | 80.23 | 64.37 | 6.47 | 99.80 | 95.55 |
| 14 | Yq11.222 | chrY 18622534 18626952 | 96.76 | 75.21 | 0.00 | 0.00 | 84.75 | 98.47 |
| 15 | Yq11.223 | chrY 21845475 21850069 | 0.00 | 70.85 | 64.60 | 6.40 | 99.32 | 99.15 |
| 16 | 21q21.1 | chr21 18563368 18566735 | 0.00 | 0.00 | 95.26 | 96.12 | 0.00 | 0.00 |
| 17 | 5q33.3 | chr5 156660448 156663815 | 0.00 | 0.00 | 95.26 | 96.12 | 0.00 | 0.00 |
| 18 | 1q22 | chr1 155629408 155632775 | 0.00 | 0.00 | 95.26 | 96.12 | 0.00 | 0.00 |
| 19 | 7q36.1 | chr7 150561277 150563994 | 0.00 | 0.00 | 0.00 | 16.91 | 99.02 | 54.51 |
| 20 | 8q21.13 | chr8 78652302 78654820 | 0.00 | 0.00 | 0.00 | 0.00 | 87.34 | 100.00 |
| 21 | 10q24.2 | chr10 99822511 99825532 | 0.00 | 0.00 | 52.29 | 95.43 | 0.00 | 0.00 |
| 22 | 12q13.11 | chr12 48509228 48511681 | 0.00 | 0.00 | 88.72 | 63.52 | 0.00 | 0.00 |
| 23 | Yq11.222 | chrY 17669948 17671523 | 52.81 | 62.09 | 0.00 | 0.00 | 0.00 | 0.00 |
Fig. 3Phylogenetic analysis of the HML-9 near-full-length proviruses, solo LTRs, and 4 subregions by the maximum likelihood method. Phylogenetic analyses of 5 HML-9 proviral elements (A), 44 solo LTRs (B), 10 gag elements (C), 8 pro elements (D), 11 pol elements (E), and 13 env elements (F), together with references. The two intragroup clusters of the pro and pol genes (types a and b) were annotated and depicted with brown and orange background colors, respectively. The resulting phylogeny was tested by the bootstrap method with 500 replicates. The branch length indicates the number of substitutions per site
Estimated time of HML-9 elements integration
| Locus | Provirus regions | Divergence from consensus sequence | Mean divergences | Age/million years (gene vs consensus) | Divergence between 2 LTRs | T = D/0.2/2 | Age/million years (LTR vs LTR) | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| gag | pro | pol | env | ||||||||
| 16p12.3 | chr16 19393581 19402152 | 0.059 | 0.158 | 0.206 | 0.089 | 0.128 | 0.64 | 64.00 | 0.082 | 0.20500 | 20.50000 |
| 2p12 | chr2 82022660 82031279 | 0.061 | 0.182 | 0.204 | 0.101 | 0.137 | 0.685 | 68.50 | 0.070 | 0.17500 | 17.50000 |
| 15q21.1 | chr15 45234477 45243073 | 0.051 | 0.126 | 0.206 | 0.099 | 0.121 | 0.6025 | 60.25 | 0.080 | 0.20000 | 20.00000 |
| 8p11.1 | chr8 43694016 43702583 | 0.091 | 0.177 | 0.231 | 0.121 | 0.155 | 0.775 | 77.50 | 0.107 | 0.26750 | 26.75000 |
| 13q31.1 | chr13 84869526 84877320 | 0.054 | NA | 0.208 | 0.103 | 0.122 | 0.608333333 | 60.83 | NA | NA | NA |
| 4q33 | chr4 170126345 170133883 | 0.058 | 0.172 | 0.214 | NA | 0.148 | 0.74 | 74.00 | NA | NA | NA |
| 6p12.3 | chr6 48873675 48879725 | 0.063 | NA | NA | 0.106 | 0.085 | 0.4225 | 42.25 | 0.110 | 0.27500 | 27.50000 |
| Yp11.2 | chrY 9273707 9279611 | 0.114 | NA | NA | 0.139 | 0.127 | 0.6325 | 63.25 | 0.141 | 0.35250 | 35.25000 |
| 8q24.3 | chr8 145019974 145032719 | NA | NA | 0.513 | 0.093 | 0.303 | 1.515 | 151.50 | NA | NA | NA |
| Yq11.223 | chrY 21580120 21585551 | 0.105 | NA | NA | 0.140 | 0.123 | 0.6125 | 61.25 | NA | NA | NA |
| 19q13.2 | chr19 40954172 40959178 | 0.075 | NA | NA | 0.075 | 0.375 | 37.50 | 0.097 | 0.24250 | 24.25000 | |
| Yp11.2 | chrY 8121821 8126768 | NA | NA | NA | 0.125 | 0.125 | 0.625 | 62.50 | 0.157 | 0.39250 | 39.25000 |
| Yp11.2 | chrY 8996062 9000755 | NA | NA | NA | 0.133 | 0.133 | 0.665 | 66.50 | NA | NA | NA |
| Yq11.222 | chrY:18622534–18626952 | NA | NA | NA | NA | NA | NA | NA | 0.194 | 0.48500 | 48.50000 |
| Yq11.223 | chrY 21845475 21850069 | NA | NA | NA | 0.143 | 0.143 | 0.715 | 71.50 | NA | NA | NA |
| 21q21.1 | chr21 18563368 18566735 | NA | 0.266 | 0.190 | NA | 0.228 | 1.14 | 114.00 | NA | NA | NA |
| 5q33.3 | chr5 156660448 156663815 | NA | 0.215 | 0.160 | NA | 0.188 | 0.9375 | 93.75 | NA | NA | NA |
| 1q22 | chr1 155629408 155632775 | NA | 0.212 | 0.156 | NA | 0.184 | 0.92 | 92.00 | NA | NA | NA |
| 7q36.1 | chr7 150561277 150563994 | NA | NA | NA | 0.197 | 0.197 | 0.985 | 98.50 | NA | NA | NA |
| 10q24.2 | chr10 99822511 99825532 | NA | NA | 0.169 | NA | 0.169 | 0.845 | 84.50 | NA | NA | NA |
Fig. 4The genes associated with solo LTRs and GO Slim summaries. A The number of associated genes per solo LTR. B Binned by orientation and distance to TSS. C Binned by absolute distance to TSS. Biological process (D), cellular component (E), and molecular function (F) summaries are represented by red, blue, and green bars, respectively. The height of the bar represents the number of IDs in the gene list and in the category
Fig. 5Enrichment result categories binned by biological process, cellular component, and molecular function. A, B Bar chart and customizable volcano plot of the biological process enrichment results. C and D, Bar chart and customizable volcano plot of the cellular component enrichment results. E and F, Bar chart and customizable volcano plot of molecular function enrichment results
Fig. 6In silico examination of the conserved transcription factor binding sites and logos representing the PBSs of HML-9. A The forward arrow indicates the sense strand, and the reverse arrow indicates the antisense strand. Different transcription factors are marked with different colors. B PBS nucleotide sequence