| Literature DB >> 15476554 |
Palle Villesen1, Lars Aagaard, Carsten Wiuf, Finn Skou Pedersen.
Abstract
BACKGROUND: Human endogenous retroviruses (HERVs) comprise a large class of repetitive retroelements. Most HERVs are ancient and invaded our genome at least 25 million years ago, except for the evolutionary young HERV-K group. The far majority of the encoded genes are degenerate due to mutational decay and only a few non-HERV-K loci are known to retain intact reading frames. Additional intact HERV genes may exist, since retroviral reading frames have not been systematically annotated on a genome-wide scale.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15476554 PMCID: PMC524368 DOI: 10.1186/1742-4690-1-32
Source DB: PubMed Journal: Retrovirology ISSN: 1742-4690 Impact factor: 4.602
Figure 1A: Genomic organization of simple retroviruses when present as a provirus (DNA) integrated in the host genome. The regulatory long terminal repeats (LTRs) flank the internal three major genes gag, pol and env. A fourth gene pro is present between gag and pol for some retroviruses, while part of either gag or pol in others. B: Individual BLAST hits (white and yellow boxes) on either strand of the human genome were clustered into HERV regions (blue boxes) or discarded by using a score function. Finally, only HERV regions with at least one retroviral ORF were kept (see Materials and Methods). In the example illustrated HERV ID 5715 was presumably inserted into an existing HERV locus with the opposite orientation. HERV ID 5715 is located in the first intron of the CD48 gene (antisense direction) and is also known as HERV-K18 or IDDMK1,222. C: HERV ID 5715 with graphical vORF annotation. Putative LTR structures are indicated and all ORFs (stop-codon to stop-codon fragments above 62 aa) are mapped and annotated by homology criteria
Genomic distribution of HERV regions
| Chr. | Length (Mb) | Windows analyzeda | Observed HERVs | Expected HERVs | χ2 testb | χ2 test within chr.c |
| 1 | 246 | 228 | 654 | 614.7 | 0.0987 | |
| 2 | 243 | 242 | 534 | 652.4 | ||
| 3 | 199 | 197 | 581 | 531.1 | 0.0250 | |
| 4 | 192 | 190 | 641 | 512.2 | ||
| 5 | 181 | 180 | 446 | 485.3 | 0.0656 | |
| 6 | 171 | 169 | 496 | 455.6 | 0.0513 | |
| 7 | 159 | 157 | 342 | 423.3 | ||
| 8 | 146 | 145 | 396 | 390.9 | 0.7923 | |
| 9 | 136 | 120 | 258 | 323.5 | ||
| 10 | 135 | 135 | 304 | 364.0 | ||
| 11 | 134 | 133 | 379 | 358.6 | 0.2695 | |
| 12 | 132 | 132 | 393 | 355.9 | 0.0440 | |
| 13 | 113 | 98 | 239 | 264.2 | 0.1146 | |
| 14 | 105 | 88 | 205 | 237.3 | 0.0335 | |
| 15 | 100 | 83 | 135 | 223.8 | ||
| 16 | 90 | 82 | 101 | 221.1 | ||
| 17 | 82 | 80 | 98 | 215.7 | ||
| 18 | 76 | 77 | 167 | 207.6 | 0.0043 | |
| 19 | 64 | 57 | 259 | 153.7 | ||
| 20 | 64 | 62 | 76 | 167.2 | ||
| 21 | 47 | 36 | 85 | 97.1 | 0.2181 | 0.0588 |
| 22 | 49 | 36 | 55 | 97.1 | ||
| X | 154 | 152 | 629 | 409.8 | ||
| Y | 50 | 25 | 359 | 67.4 | ||
| TOTAL | 3068 | 2905 | 7832 | 7832d |
a Only windows overlapping with NCBI GoldenPath (release 34)
b Single chromosomes tested against group of other chromosomes. P-values below the significance level 0.00208 (0.05/24, Bonferroni corrected) are underlined.
c The genomic positions of HERVs were χ2 tested against a random distribution using 10000 simulations for each chromosome.
d Four additional HERV regions are located in the DR51 haplotype of the HLA region on chromosome 6 and not counted here.
Figure 2Number of HERV regions located inside genes, and their orientation relative to the gene. The expected number assumes a random genomic distribution.
Distribution of vORF lengths (stop codon to stop codon)
| vORF size (aa/codons) | Gag | Pro | Pol | Env | HERV regions |
| 63 – 100 | 4820 | 1322 | 10390 | 2354 | 6795 |
| 100 – 200 | 4015 | 1002 | 9110 | 2278 | 5803 |
| 200 – 300 | 643 | 165 | 1426 | 361 | 1894 |
| 300 – 400 | 160 | 54 | 286 | 81 | 527 |
| 400 – 500 | 33 | 3 | 70 | 24 | 123 |
| 500 – 600 | 1 | 20 | 12 | 33 | |
| 600 – 700 | 4 | 10 | 9 | 22 | |
| 700 – 800 | 10 | 4 | 7 | 15 | |
| 800 – 900 | 1 | 1 | 2 | ||
| 900 – 1000 | 1 | 5 | 6 | ||
| > 1000 | 1 | 3 | 4 |
Figure 3Genomic distribution of all Gag (red) and Env (blue) ORFs above 500 aa and Pol (green) ORFs above 700 aa. Right-pointing triangles denote intact ORFs, while left-pointing triangles denote ORFs that are almost-intact besides a single stop codon or frame-shift mutation.
Previously and newly identified long Env ORFs in the human genome
| Genea | Bibliographic name | Chromosomal position of locus (NCBI release 34) | Lengthc | ORF ID | Comment | EST matchesd |
| HERV H- like Env | Chr. X 70307525–70316940 (+1) | 474 | 4769 | N-term unknown Minor C-term deletion | ||
| EnvF(c)1 | Chr. X 95868842–95875915 (+1) | 583 | 8944 | Intacta | ||
| HERV-W Env | Chr. X 105067535–105070015 (-1) | 475 | 24413 | Minor N-term deletion | 3 | |
| HERV-K Env (type 1) | Chr. 1 75266332–75270814 (+1) | 586 | 42910 | In frame pol-env fusion | 3 | |
| HERV-K Env (type 1) | K18-SAg IDDMK1,222 | Chr. 1 157878336–157885675 (+1) | 560 | 46511 | In frame pol-env fusion | |
| EnvH3 | EnvH/p59 | Chr. 2 155926784–155933168 (+1) | 554 | 70149 | Intacta | |
| HERV-K Env (type 1) | Chr. 2 130813720–130815944 (-1) | 687 | 80419 | In frame pol-env fusion | ||
| EnvH1 | EnvH/p62 H19 | Chr. 2 166767087–166774769 (-1) | 583 | 82113 | Intacta | |
| EnvR(b) | Chr. 3 16781208–16788508 (+1) | 513 | 86185 | Intacta | ||
| HERV-K Env (type 1) | Chr. 3 114064939–114072223 (-1) | 597 | 103885 | In frame pol-env fusion C-term deletion | ||
| EnvH2 | EnvH/p60 | Chr. 3 167860265–167867997 (-1) | 562 | 107739 | Intacta | |
| HERV-K-like Env | Chr. 5 34507318–34513254 (-1) | 475 | 153615 | N- and C-term deletion | ||
| EnvFRD | Syncytin 2 | Chr. 6 11211667–11219905 (-1) | 537 | 171089 | Intacta | 16 |
| EnvK4 | HERV-K109 | Chr. 6 78422690–78431275 (-1) | 697 | 174741 | Intacta | |
| EnvK2b | HML-2.HOM HERV-K108 | Chr. 7 4367317–4383401 (-1) | 698 | 188263 188274 | Intacta | 4 |
| EnvR | Erv3 | Chr. 7 63862984–63871411 (-1) | 605 | 191393 | Intacta | 17 |
| EnvW | Syncytin (1) | Chr. 7 91710047–91718755 (-1) | 537 | 192333 | Intacta | 100 |
| EnvF(c)2 | Chr. 7 152498159–152502575 (-1) | 545 | 195475 | Intacta | 1 | |
| EnvK6 | HERV-K115 | Chr. 8 7342682–7353583 (-1) | 698 | 204173 | Intacta | |
| HERV-K Env | Chr. 11 101104479–101112064 (+1) | 661 | 240932 | Minor C-term deletion | 6 | |
| HERV-K-like Env | Chr. 12 104204746–104209814 (+1) | 658 | 255589 | Minor C-term deletion | ||
| EnvK1 | Chr. 12 57008431–57016689 (-1) | 697 | 260042 | Intacta | ||
| ZFERV-like Env | Chr. 14 91072914–91085655 (-1) | 664 | 285129 | |||
| HERV-K Env (type 1) | Chr. 16 35312483–35314318 (+1) | 550 | 293143 | In frame pol-env fusion | ||
| EnvT | Chr. 19 20334642–20343232 (+1) | 664 | 310016 | Intacta | ||
| HERV-W/FRD-like Env | Chr. 19 58210000–58211244 (+1) | 477 | 312172 | N-term unknown Minor C-term deletion | 3 | |
| HERV-W/FRD-like Env | Chr. 19 58244133–58246051 (+1) | 535 | 312208 | N-term unknown | 3 | |
| EnvK3 | HERV-K (C19) | Chr. 19 32821287–32829201 (-1) | 698 | 314652 | Intacta |
a Nomenclature for verified and complete env genes as in de Parseval et al. [41]. Note that EnvK5 (HERV-113) at Chr. 19 [14] is not present in the NCBI release 34 of the human genome.
b EnvK2 is organized as a tandem repeat.
c ORF length from start to stop codon.
d Number of ESTs that map to the same genomic region (see text).