| Literature DB >> 29118853 |
Nicole Grandi1, Marta Cadeddu1, Maria Paola Pisano1, Francesca Esposito1, Jonas Blomberg2, Enzo Tramontano1,3.
Abstract
BACKGROUND: About half of the human genome is constituted of transposable elements, including human endogenous retroviruses (HERV). HERV sequences represent the 8% of our genetic material, deriving from exogenous infections occurred millions of years ago in the germ line cells and being inherited by the offspring in a Mendelian fashion. HERV-K elements (classified as HML1-10) are among the most studied HERV groups, especially due to their possible correlation with human diseases. In particular, the HML10 group was reported to be upregulated in persistent HIV-1 infected cells as well as in tumor cells and samples, and proposed to have a role in the control of host genes expression. An individual HERV-K(HML10) member within the major histocompatibility complex C4 gene has even been studied for its possible contribution to type 1 diabetes susceptibility. Following a first characterization of the HML10 group at the genomic level, performed with the innovative software RetroTector, we have characterized in detail the 8 previously identified HML10 sequences present in the human genome, and an additional HML10 partial provirus in chromosome 1p22.2 that is reported here for the first time.Entities:
Keywords: Autoimmune diseases; Cancer; HML10; Herv; Herv-k(C4); Human endogenous retroviruses; RetroTector
Year: 2017 PMID: 29118853 PMCID: PMC5667498 DOI: 10.1186/s13100-017-0099-7
Source DB: PubMed Journal: Mob DNA
HML10 proviral sequences localized in the human genome GRCh37/hg19 assembly
| Locus | Coordinates a | Length | First reference | RVNR b | Genomic context | Secondary integrations |
|---|---|---|---|---|---|---|
| 1p36.13 | 1:20,253,380–20,259,203 (−) | 5824 | Vargiu 2016 | 5836 | intergenic | – |
| 1p22.2 | 1:89,551,973–89,554,309 | 2337 | this study | – | intergenic | – |
| 1q22 | 1:155,661,620–155,669,312 (−) | 7693 | Vargiu 2016 | 6073 | DAP3 (+) | AluSp 155,663,467–155,663,784 (+) |
| 6p22.1 | 6:27,155,300–27,164,058 (+) | 8759 | Vargiu 2016 | 2101 | intergenic | AluY 27,158,573–27,158,903 (+) |
| 6p21.33 | a) 6:31,952,469–31,958,829 (−) | 6361 | Tassabehji 1994 | 2116 | C4A (+) | – |
| b) 6:31,985,207–31,991,567 (−) | 6361 | Tassabehji 1994 | 2115 | C4B (+) | – | |
| 6q22.31 | 6:122,825,990–122,833,238 (−) | 7249 | Vargiu 2016 | 2320 | PKIB (+) | AluY 122,827,840–122,828,145 (−) |
| 19p13.2 | 19:7,860,947–7,865,932 (−) | 4986 | Vargiu 2016 | 4599 | intergenic | AluY 7,861,800–7,862,107 (−) |
| 19q13.41 | 19:52,964,148–52,969,750 (−) | 5458 | Vargiu 2016 | 4762 | ZNF578 (+) | – |
| Yq11.221 | Y:15,105,784–15,113,006 (−) | 7223 | Vargiu 2016 | 5104 | L1M3f (−) | LTR2B 15,106,449–15,106,924 (−) |
aChromosome: start-end (strand). Positions are referred to the human genome sequence, assembly GRCh37/hg19
bIndividual sequences identifiers in the first reference study (Vargiu et al. 2016, [3])
Fig. 1Chromosomal distribution of HML10 proviruses and solitary LTRs. The number of HML10 elements integrated in each human chromosome is depicted and compared with respect to the number of expected random insertion events based on chromosomal length. To have a more reliable estimation, we considered the number of proviruses identified by Vargiu et al. 2016 [3] as well as the solitary LTR relics, as reported by Broecker et al. 2016 [27], also representing previous integration events. The two sequences in locus 6p21.33, being a duplication of the same proviral integration, were counted as a single provirus. * statistically significant based on chi-square test (p < 0,0001)
Fig. 2Phylogenetic analysis of the full-length retrieved sequences and other endogenous and exogenous Betaretroviruses. The main HML10 phylogenetic group is indicated. The two intragroup clusters (I and II) are also annotated and depicted with blue and green lines, respectively. Evolutionary relationships were inferred by using the Neighbor Joining method and the Kimura-2-parameter model. The resulting phylogeny was tested by using the Bootstrap method with 1000 replicates. Length of branches indicates the number of substitutions per site
Fig. 3HML10 proviruses structural characterization. Each HML10 provirus nucleotide sequence has been compared to the reference sequence HERV-K(C4) (RepBase). Nucleotides insertions and deletions, LTR regulatory elements and retroviral genes predicted functional domains are annotated. Type II proviruses are reported in red and showed a more divergent nucleotide sequence, especially in pol RNase H and IN portions and env 5′ region (red stripes). Due to the high number of nucleotide changes, the comparison of these portions to the reference is depicted separately. RT: Reverse Transcriptase; RDDP: RNA dependent DNA polymerase; T: thumb; RH: Ribonuclease H; IN: Integrase; Zb: Zinc binding; Db: DNA binding; GP: glycoprotein; HR: Heptad Repeats. Type I proviruses present in the correspondent portion an A/T-rich stretch previously reported for HERV-K(C4) between pol and env genic regions
Fig. 4Phylogenetic analysis of the HML10 subtype II Rec putative proteins. The HML10 subtype II proviruses nucleotide sequences corresponding to a predicted Rec domain were translated and the obtained putative proteins (puteins) were analyzed in a NJ tree including previously reported HERV-K HML2 Rec proteins (black triangles) and the analogues HIV-1 Rev. (white triangle), HTLV-1 Rex (black square) and STLV Rex (white square) proteins. Evolutionary relationships were inferred by using the Neighbor Joining method and the p-distance model. The resulting phylogeny was tested by using the Bootstrap method with 1000 replicates. Length of branches indicates the number of substitutions per site
Fig. 5Structural comparison between HERV-K HML2 Rec proteins and the putative HML10 Rec amino acid sequences. The HML10 subtype II proviruses nucleotide sequences corresponding to a predicted Rec domain were translated and the obtained putative proteins (sequences 10–14) were compared to the HERV-K HML2 Rec proteins reported in UniProt (sequences 1–9). Coloured residues represent amino acid substitutions with respect to Q69383 HML2 Rec protein reference sequence. The presence of stop codons is indicated with a star into a black square, the occurrence of frameshifts is indicated with a red square. The putative protein theoretically originated by the inferred ORFs are indicated with a light green arrow. The localization of HML2 Rec proteins Nuclear Localization Signal (NLS) and Nuclear Export Signal (NES) as well as the correspondent putative signals in HML10 Rec puteins are also indicated
HML10 sequences estimated time of integration
| LTR vs LTR | LTR vs consensus |
|
|
|
| O.C.A. c | |
|---|---|---|---|---|---|---|---|
| 1p36.13 | 14.1 | 21.0 | 22.5 | no | 31.9 |
| rhesus |
| 1p22.2 | no 5′ and 3’LTRs | no 5′ and 3’LTRs | no | no | 45.0 |
| rhesus |
| 1q22 | 14.7 | 44.1 | 35.7 | 28.9 | 32.7 |
| rhesus |
| 6p22.1 | 12.7 | 36.5 | 43.0 | 18.9 | 32.8 |
| rhesus |
| 6p21.33a | 22.9 | 18.0 | 25.2 | 21.3 | 21.3 |
| rhesusd |
| 6p21.33b | 22.9 | 18.0 | 25.2 | 21.3 | 21.3 |
| orangutand |
| 6q22.31 | 17.2 | 38.8 | 38.9 | 44.8 | 35.1 |
| rhesus |
| 19p13.2 | no 5′ and 3’LTRs | no 5′ and 3’LTRs | e | 20.8 | no |
| rhesus |
| 19q13.41 | no 3’LTR | 46.0 | 37.4 | 27.2 | 45.9 |
| rhesus |
| Yq11.221 | 20.8 | 45.2 | 41.5 | 30.4 | 44.7 |
| rhesus |
|
|
|
|
|
|
|
|
apartial sequence: nucleotides 1277–2571 in LTR14-HERVKC4-LTR14
bpartial sequence: nucleotides 4103–5810 in LTR14-HERVKC4-LTR14
cOldest Common Ancestor
dProvirus loss in various intermediate species: chimpanzee, gorilla, orangutan and gibbon (6p21.33a); chimpanzee, gorilla, gibbon and rhesus (6p21.33b)
esequence showing an highly divergent gag sequence, giving an estimated T of 165,7 that was not taken into account for the final T calculation
Fig. 6Overview of HML10 group colonization of primate lineages. Boxplot representations of HML10 group period of entry in primate lineages. The estimated age (in million years) was calculated considering the divergence values between i) the 5′ and 3′ LTRs of the same provirus; ii) each LTR and a generated consensus; iii) gag, pol and env genes and a generated consensus. The approximate period of evolutionarily separation of the different primate species are also indicated and have been retrieved from Steiper et al. 2006 [70] and Perelman et al. 2011 [71]. Boxes represent the main period of HML10 group diffusion in primates based on the different approaches of calculation, including from 25 to 75 percentiles and showing the mean value as a blue dash. Whiskers indicate the minimum and maximum estimated age
HML10 sequences orthologous loci in non-human primates genome
| Human locus | Chimpanzee | Gorilla | Orangutan | Gibbon | Rhesus | Marmoset |
|---|---|---|---|---|---|---|
| 1p36.13 (−) | 1:19,897,252–19,903,183 (−) | 1:20,573,241–20,579,060 (−) | 1:210,407,411–210,413,307 (+) | 24:19,115,921–19,117,286 (−) | 1:22,729,037–22,740,752 (−) | x |
| 1p22.2 (−) | 1:89,883,243–89,885,583(−) | x | 1:139,752,930–139,755,294 (+) | 12:87,503,425–87,505,758 | 1:92,543,319–92,545,983 (−) | x |
| 1q22 (−) | 1:133,941,236–133,948,931 (−) | 1:134,686,645–134,687,185 (−) | 1:95,817,622–95,818,162 (+) | assembly gap | 1:134,772,475–134,779,343 (−) | x |
| 6p22.1 (+) | 6:27,446,871–27,456,058 (+) | 6:28,001,913–28,010,233 (+) | 6:28,071,758–28,078,582 (+) | 1a:72,438,487–72,447,474 (+) | 4:27,112,448–27,121,339 (+) | x |
| 6p21.33a (−) | x | x | x | x | 4:32,223,558–32,230,572 (−) | x |
| 6p21.33b (−) | x | x | 6:32,500,019–32,506,424 (−) | x | x | x |
| 6q22.31 (−) | 6:123,707,066–123,714,005 (−) | 6:122,872,935–122,879,489 (−) | 6:125,032,218–125,039,364 (−) | 3:109,711,272–109,718,216 (−) | 4:143,675,558–143,676,403 (−) | x |
| 19p13.2 (−) | 19:7,923,717–7,929,241 (−) | 19:8,020,313–8,024,861 (−) | 19:7,962,003–7,966,295 (−) | 10:66,445,268–66,447,647 (+) | 19:8,140,869–8,144,331 (+) | x |
| 19q13.41 (−) | 19:57,389,749–57,395,370 (−) | 19:49,869,509–49,875,109 (−) | 19:53,964,824–53,970,559 (−) | 10:72,725,038–72,730,734 (−) | 19:58,261,760–58,267,798 (−) | x |
| Yq11.221 (−) | Y:20,496,417–20,503,728 (−) | – | – | – | – | – |
For each human HML10 locus (for precise start and end positions, see Table 1), chromosome coordinates and strand of orthologous loci are given for the other regarded non-human Catarrhini primate reference genome sequences. Apparent absence of a HML10 sequence in the orthologous genome position is indicated by “x”. Regarding the HML10 locus on the human chromosome Y, comparative information is available for chimpanzee genome sequence only (see main text)
Fig. 7HML10 proviruses PBS analyses. Nucleotide alignment of the PBS sequences identified in the HML10 proviruses. In the upper part, a logo represents the general HML10 PBS consensus sequence: for each nucleotide, the letter height is proportional to the degree of conservation among HML10 members. As indicated, all the HML10 PBS sequences are predicted to recognize a Lysine (K) tRNA
Fig. 8Phylogenetic analysis of the HML10 sequences gag, pol and env genes with other endogenous and exogenous Betaretroviruses. The main HML10 phylogenetic group is indicated. The two intragroup clusters (I and II), when present, are also annotated and depicted with blue and green lines, respectively. In the absence of clear cluster division, the belonging of each element to the two subgroups is indicated based on the full-length proviruses phylogenetic analysis (Fig. 2). Evolutionary relationships were inferred by using the Neighbor Joining method and the Kimura-2-parameter model. The resulting phylogeny was tested by using the Bootstrap method with 1000 replicates. Length of branches indicates the number of substitutions per site