| Literature DB >> 34069102 |
Nicole Grandi1, Maria Paola Pisano1, Eleonora Pessiu1, Sante Scognamiglio1, Enzo Tramontano1,2.
Abstract
Endogenous Retroviruses (ERVs) are ancient relics of infections that affected the primate germ line and constitute about 8% of our genome. Growing evidence indicates that ERVs had a major role in vertebrate evolution, being occasionally domesticated by the host physiology. In addition, human ERV (HERV) expression is highly investigated for a possible pathological role, even if no clear associations have been reported yet. In fact, on the one side, the study of HERV expression in high-throughput data is a powerful and promising tool to assess their actual dysregulation in diseased conditions; but, on the other side, the poor knowledge about the various HERV group genomic diversity and individual members somehow prevented the association between specific HERV loci and a given molecular mechanism of pathogenesis. The present study is focused on the HERV-K(HML7) group that-differently from the other HERV-K members-still remains poorly characterized. Starting from an initial identification performed with the software RetroTector, we collected 23 HML7 proviral insertions and about 160 HML7 solitary LTRs that were analyzed in terms of genomic distribution, revealing a significant enrichment in chromosome X and the frequent localization within human gene introns as well as in pericentromeric and centromeric regions. Phylogenetic analyses showed that HML7 members form a monophyletic group, which based on age estimation and comparative localization in non-human primates had its major diffusion between 20 and 30 million years ago. Structural characterization revealed that besides 3 complete HML7 proviruses, the other group members shared a highly defective structure that, however, still presents recognizable functional domains, making it worth further investigation in the human population to assess the presence of residual coding potential.Entities:
Keywords: HERV; HERV-K; HML7; endogenous retroviruses; retrotransposons
Year: 2021 PMID: 34069102 PMCID: PMC8156875 DOI: 10.3390/biology10050439
Source DB: PubMed Journal: Biology (Basel) ISSN: 2079-7737
HML7 proviral sequences identified in the human genome assembly GRCh38/hg38.
| Locus 1 | Strand | Coordinates | Age (Milion Years) | o.c.a 2 | Reference |
|---|---|---|---|---|---|
| 1q43 | + | 242457056–242460436 | 22.9 | orangutan | Vargiu et al. |
| 2q11.2 P | + | 101237831–101245098 | 32.3 | rhesus | Vargiu et al. |
| 2q31.1 | − | 170615659–170619450 | 24.3 | gibbon | this study |
| 3q11.2 P | + | 94647178–94651193 | 46.9 | gibbon | Vargiu et al. |
| 3q23 | + | 142206726–142207843 | 29.9 | gibbon | this study |
| 3q26.1 | + | 165546697–165548895 | 37.0 | gibbon | this study |
| 4q25 | − | 109855914–109859920 | 27.9 | gibbon | Vargiu et al. |
| 4q32.1 | − | 160255828–160257162 | 22.0 | gibbon | this study |
| 5p13.2 | − | 34460957–34468537 | 29.8 | rhesus * | Vargiu et al. |
| 5q22.3 | − | 114142954–114150230 | 29.3 | gibbon | Vargiu et al. |
| 6p12.3 | − | 49501277–49508842 | 29.3 | rhesus * | Vargiu et al. |
| 6q22.31 | + | 121042084–121049347 | 26.6 | gibbon | Vargiu et al. |
| 7q21.12 | − | 87732402–87733417 | 40.0 | gibbon | this study |
| 7q36.2 | − | 153843398–153844485 | 31.0 | gibbon | this study |
| 11p12 | − | 43161738–43168627 | 18.1 | orangutan | Vargiu et al. |
| 11q14.3 | − | 92943568–92950131 | 26.4 | gibbon | Vargiu et al. |
| 12q12 P | + | 38122838–38130125 | 36.4 | gorilla | Vargiu et al. |
| 15q24.3 | + | 76639167–76641589 | 24.3 | gibbon | this study |
| 19q13.2 | − | 42712688–42713786 | 30.5 | gibbon | this study |
| Xq11.1 C | − | 62707600–62717099 | - | gorilla ° | Vargiu et al. |
| Xq22.3 | − | 105666188–105669044 | 22.0 | gibbon | this study |
| Yq11.221 | − | 15973344–15982688 | 38.2 | gorilla * | Vargiu et al. |
| Yp11.2 | − | 7952560–7961873 | 38.2 | chimp | Vargiu et al. |
1 HML7 elements integrated in pericentromeric and centromeric regions are indicated with a P and a C, respectively. 2 Oldest Common Ancestor: HML7 loci converted into a solitary LTR during primate speciation (*) or lacking an orthologue in intermediate primate species (°) are also indicated.
Figure 1Chromosomal distribution of HML7 loci. In the upper part of the figure, HML7 proviruses (red arrows) and solitary LTRs (blue lines) have been visualized on the human karyotype (source: www.ensembl.org (accessed on 30 March 2021)). In the lower part of the figure, the observed chromosomal distribution of HML7 elements was statistically compared to the expected one, showing significant decrease in chromosome 15 and enrichment in chromosome X integrations.
HML7 proviral sequences colocalized with cellular genes.
| HML7 | Colocalized Gene Info | ||||
|---|---|---|---|---|---|
| Name | Portion | Description | Function | Associated Diseases | |
| 1q43 | PLD5 | intronic, | Phospholipase D family member 5 | Hydrolyzes phosphatidylcholine | Type 7 nephrotic syndrome; hemopneumothorax |
| 2q31.1 | MYO3B | intronic, | Myosin IIIB | Probable actin-based ATPase with protein kinase activity. Required for normal cochlear development and hearing | Autosomal recessive deafness 30; entropion |
| 3q23 | GK5 | intronic, | Glycerol kinase 5 | Glycerol degradation, triacylglycerol biosynthesis | Type 1 Diabetes Mellitus 3 and 7 |
| 3q26.1 | LINC01322 | intronic, sense | long intergenic non-coding RNA 1322 | - | - |
| 4q25 | LRIT3 | intronic, | Leucine rich repeat Ig-Like transmembrane domains 3 | May regulate fibroblast growth factor receptors and affect their post-translational modification | Congenital stationary night blindness |
| 5q22.3 | KCNN2 | intronic, | Potassium calcium-activated channel subfamily N member 2 | Forms a voltage-independent potassium channel activated by intracellular calcium following membrane hyperpolarization | Lingual-facial-buccal dyskinesia and aceruloplasminemia |
| 6p12.3 | GLYATL3 | intronic, | Glycine-N-acyltransferase like 3 | Catalyzes the conjugation of long-chain fatty acyl-CoA thioester and glycine, an intermediate in primary fatty acid biosynthesis | - |
| 7q21.12 | RUNDC3B | intronic, | RUN domain-containing protein 3B | Encodes a predicted RAP2-interacting protein. May play a role in RAS-like GTPase signaling pathways | - |
| 7q36.2 | DPP6 | intronic, | dipeptidyl peptidase like 6, transcript variant 6 | Member of S9B family of serine proteases (without detectable activity). | Autosomal dominant mental retardation; paroxysmal familial ventricular fibrillation |
| 15q24.3 | SCAPER | intronic, | S-phase cyclin A associated protein in the endoplasmic reticulum | Cyclin A/Cdk2 regulatory protein that transiently maintains cyclin A in the cytoplasm | Intellectual developmental disorder and retinitis pigmentosa; brachydactyly |
| Xq22.3 | IL1RAPL2 | intronic, | Interleukin 1 receptor accessory protein like 2 | Orphan receptor in the IL1R superfamily | Cinca syndrome; Muckle-Wells syndrome |
The table shows in order: The locus of each HML7 and its strand, the name of the colocalized gene and its strand, the intronic/exonic and sense/antisense localization of the HML7 element, and the description of the gene product and its function. In the last column, the pathologies associated so far with each gene are also reported (source: OMIM database).
Figure 2Structural characterization of HML7 proviral loci. The identified HML7 proviruses have been aligned with the Dfam proviral reference, and all insertions and deletions ≥1 nucleotide have been annotated.
Coding potential and functional domain predicted for the HML7 proviral loci.
| gag | pro | pol | env | Translation | GC% | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HML7 | Shift | Stop | MA | CA | NC | NC | Shift | Stop | PR | dUTPase | Shift | Stop | RT | RH | IN | IN core | Shift | Stop | SU gp | TM hr | gag/pro Shift | pro/pol | |
|
| 6 | 7 | x | x | 1 | 1 | x | x | 39.6 | ||||||||||||||
|
| x | 3 | 1 | x | x | x | x | 5 | 5 | x | x | 40.6 | |||||||||||
| 2q31.1 | x | x | x | x | 38.8 | ||||||||||||||||||
|
| x | 6 | 2 | x | x | 41.1 | |||||||||||||||||
| 3q23 | x | x | 41.9 | ||||||||||||||||||||
| 3q26.1 | 39.0 | ||||||||||||||||||||||
|
| 6 | 8 | 1 | 5 | x | x | 39.2 | ||||||||||||||||
| 4q32.1 | x | x | x | 40.6 | |||||||||||||||||||
|
| x | 4 | 5 | x | x | x | 1 | 1 | x | x | 40.3 | ||||||||||||
|
| x | 8 | 4 | x | x | x | x | 9 | 1 | x | 40.2 | ||||||||||||
|
| x | 0 | 6 | x | x | x | 8 | 1 | 40.0 | ||||||||||||||
|
| x | 4 | 8 | x | x | x | x | 4 | 4 | x | x | 40.1 | |||||||||||
| 7q21.12 | 43.0 | ||||||||||||||||||||||
| 7q36.2 | 43.4 | ||||||||||||||||||||||
| 11p12 | x | x | x | x | x | 40.1 | |||||||||||||||||
|
| x | 1 | 1 | x | x | x | 1 | 5 | x | 40.3 | |||||||||||||
|
| x | 3 | 7 | x | x | x | x | 2 | 2 | x | x | 39.3 | |||||||||||
| 15q24.3 | x | x | 38.8 | ||||||||||||||||||||
| 19q13.2 | x | 40.3 | |||||||||||||||||||||
|
| 1 | 11 | x | x | x | xx | 2 | 4 | x | x | 8 | 4 | x | x | x | x | 4 | 5 | x | x | 0 | −1 | 40.0 |
| Xq22.3 | 39.7 | ||||||||||||||||||||||
|
| 3 | 7 | x | x | x | xx | 0 | 3 | x | x | 3 | 7 | x | x | x | x | 6 | 5 | x | x | 0 | −1 | 39.6 |
|
| 0 | 3 | x | x | x | xx | 2 | 2 | x | x | 11 | 8 | x | x | x | 7 | 7 | x | x | 0 | 0 | 40.0 | |
The table reports the presence of the different retroviral open reading frames and the occurrence of internal stop codonsand frameshifts in the 14 most intact HML7 loci, as identified by RetroTector software (locus name in bold). For each HML7 provirus, the presence of functional domains and taxonomical signatures was also predicted and is indicated with the symbol “x.” Abbreviations not explained in the main text: ZnF = Zinc finger motif, ZnB = Zinc-binding domain, gp = glycoprotein, hr = heptad repeats, GC% = percentage of GC nucleotides.
Figure 3Phylogenetic tree of HML7 proviral loci. The identified HML7 proviruses were analyzed with the maximum likelihood method, including also the Dfam reference proviral sequences of all HERV-K groups (HML1 to HML10). Phylogenies have been statistically tested through the bootstrap method with 100 replicates. The monophyletic group formed by the HML7 proviruses and including the HML7 group Dfam reference (black dot) is highlighted with blue branches.
Figure 4Time of integration of HML7 proviral loci in primate genomes. Temporal overview of the colonization of primate species by HML7 elements, based on the combination of time of integration estimation and comparative genomics analysis of each human locus in non-human primates. Each node indicates a speciation event, and the correspondent time is indicated in the line below. HML7 loci whose insertion occurred in centromeric or pericentromeric regions are marked with a * (Xq11.1) or a + (3q11.2, Yp11.2, 2q11.2 and 12q12), respectively, and their localization in non-human primates could possibly be affected by the lower comparability of constitutive heterochromatin. 1 Based on multiple approaches of divergence calculation; see materials and methods for further details.