| Literature DB >> 21633511 |
Aashish R Jha1, Douglas F Nixon, Michael G Rosenberg, Jeffrey N Martin, Steven G Deeks, Richard R Hudson, Keith E Garrison, Satish K Pillai.
Abstract
HERV-K113 and HERV-K115 have been considered to be among the youngest HERVs because they are the only known full-length proviruses that are insertionally polymorphic and maintain the open reading frames of their coding genes. However, recent data suggest that HERV-K113 is at least 800,000 years old, and HERV-K115 even older. A systematic study of HERV-K HML2 members to identify HERVs that may have infected the human genome in the more recent evolutionary past is lacking. Therefore, we sought to determine how recently HERVs were exogenous and infectious by examining sequence variation in the long terminal repeat (LTR) regions of all full-length HERV-K loci. We used the traditional method of inter-LTR comparison to analyze all full length HERV-Ks and determined that two insertions, HERV-K106 and HERV-K116 have no differences between their 5' and 3' LTR sequences, suggesting that these insertions were endogenized in the recent evolutionary past. Among these insertions with no sequence differences between their LTR regions, HERV-K106 had the most intact viral sequence structure. Coalescent analysis of HERV-K106 3' LTR sequences representing 51 ethnically diverse individuals suggests that HERV-K106 integrated into the human germ line approximately 150,000 years ago, after the emergence of anatomically modern humans.Entities:
Mesh:
Year: 2011 PMID: 21633511 PMCID: PMC3102101 DOI: 10.1371/journal.pone.0020234
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Phylogenetic tree of full-length HERV-K (HML-2) LTR sequences.
The clustering of the HERV-K 5′ and 3′ LTR sequences of each insertion suggests that gene conversion is rare in the HERV-K family. Each HERV is indicated by its name (K106 for HERV-K106). HERV taxa that may have undergone gene conversion are indicated in red. HERV-K115 5′LTR clusters with HERV-K109 suggesting a gene conversion occurred between these two HERV-K members. HERV-K10, HERV-K HML2HOM, and HERV-K110 are sometimes referred to as HERV-K107, HERV-K108, and HERV-K18, respectively. HERV-K110 was used to root the phylogeny because it is present in both Humans and Gorillas [16].
Human specific complete HERV-K (HML-2) proviruses within the human genome.
| S.No. | HERV | Accession no. | Location | inter-LTR differences | Average LTR size | Age |
| 1 | K106 | AF164620 | 3q13.2 | 0 | 960 | <0.8* |
| 2 | K116# | N/A | 1p31.1 | 0 | 968 | <0.8* |
| 3 | K101 | AF164609 | 22q11.2 | 2 | 968 | 1.6 |
| 4 | K107 | AF164613 | 5q33.3 | 2 | 968 | 1.6 |
| 5 | K117# | N/A | 3q27.2 | 3 | 968 | 2.4 |
| 6 | K109 | AF164615 | 6q14.1 | 3 | 960 | 2.4 |
| 7 | K113 | AY037928 | 19p13.11 | 3 | 968 | 2.4 |
| 8 | K102 | AF164610 | 1q21 | 4 | 968 | 3.2 |
| 9 | K118# | N/A | 11q22.1 | 4 | 968 | 3.2 |
| 10 | K119# | N/A | 12q14.1 | 5 | 974 | 3.9 |
| 11 | K108 | AF164614 | 7p22.1 | 6 | 968 | 4.8 |
| 12 | K103 | AF164611 | 10p12.1 | 7 | 968 | 5.6 |
| 13 | K104 | AF164612 | 5p14.3 | 12 | 964 | 12.0 |
| 14 | K115 | AY037929 | 8p23.1 | 13 | 964 | N/A |
Table 1 Notes
The identity and location of human specific complete HERV-K (HML-2) proviruses within the human genome were obtained from previous reports [15], [16]. HERV-K116, K117, K118 and K119 were previously referred to by their genomic locations.
Three HERV-K members have as many differences as the insertionally polymorphic HERV-K113 and four HERV-K members have fewer differences between their LTR than HERV-K113 of which, HERV-K106 and HERV-K116 have identical LTR, however, the latter has a 2846 bp deletion in the pol region.
All age estimates are based on inter-LTR comparisons and are in million years (Myr). *Age estimates for HERV-K106 and HERV-K116 are based on 1 SNP between their LTRs. The age of K115 is listed as ‘N/A’ because it cannot be determined by inter-LTR comparison method. The age of K115 was previously estimated to be at least 1.1 Myr using coalescent approach [25].
Figure 2Genome organization and haplotypes of HERV-K106.
Figure 2A: Genomic characterization of HERV-K106 demonstrating the two LTRs, gag, pol, and env genes. The HERV-K106 genome was annotated with the aid of the HERV-K consensus sequence (HERV-KCON) [23], two HERV-K HML2 members that have somewhat functional gag (HERV-K101 and HERV-K109) [43], and insertionally polymorphic HERV-K113 that has intact ORF [17]. Known functional elements within the HERV-K106 LTR are shown in colored boxes. All SNP positions in the coding genes are counted from the beginning of gag ORF. Five nonsynonymous SNPs shown in gag region are HERV-K106 specific. These SNPs do not include I516M mutation which singlehandedly eliminates functionality of HERV-K113 gag [43] warranting the future investigation of functionality of K106 gag. Although K106 lacks this vital SNP it harbors one base deletion at position 1977 which causes a frameshift near the end of the gag gene. Whether this frameshift causes gag to be dysfunctional is unknown. The HERV-K106 polymerase gene (pol) that includes reverse transcriptase is distinct from the other HERV-K family members used in our sequence comparison. In addition to the SNP shown, it harbors a 5 bp deletion from 4849–4853 bp that results in a frameshift mutation. The large 292 base deletion beginning at 5392 bp of pol extending into the env gene is the signature of type-I HERV-K [16]. In addition to the 292 bp deletion, HERV-K106 env gene has a premature stop codon. (<$>\raster(70%)="rg1"<$>) indicate stop codons in HERV-K106 genome. Figure 2B: SNPs and surrounding bases in 3′LTR of HERV-K106 demonstrating four haplotypes. Each haplotype is listed on the left, each SNP is represented in red and the position containing the SNP is highlighted in yellow.
Base frequencies and haplotypes of HERV-K106 3'LTR with haplotype frequencies in various ethnic groups within the United States.
| Haplotypes of HERV-K106 3'LTR | Haplotypes of HERV-K106 3'LTR by ethnicity | |||||||||||||
| Base frequencies in HERV-K106 3'LTR | base positions | Total n | Total f | Af. Am | Eur. Am | Others | ||||||||
| positions | Bases | n | f | 133 | 403 | 835 | n | f | n | f | n | f | ||
| 133 | C | 83 | 0.88 | C | G | C | 80 | 0.85 | 20 | 0.77 | 40 | 0.91 | 20 | 0.83 |
| T | 11 | 0.13 | ||||||||||||
| T | G | C | 7 | 0.07 | 2 | 0.08 | 2 | 0.05 | 3 | 0.13 | ||||
| 403 | G | 90 | 0.96 | |||||||||||
| A | 4 | 0.04 | T | A | C | 4 | 0.04 | 2 | 0.08 | 2 | 0.05 | 0 | 0.00 | |
| 835 | C | 91 | 0.97 | C | G | T | 3 | 0.03 | 2 | 0.08 | 0 | 0 | 1 | 0.04 |
| T | 3 | 0.03 | ||||||||||||
|
|
|
|
|
|
|
| ||||||||
Table 2 Notes
Alleles at all three SNP sites in HERV-K106 3′ LTR consisted of two alternate nucleotide bases with one being predominant than the other (133: C>T, 403: G>A, and 835: C>T). Using these SNP four haplotypes of HERV-K106 could be constructed.
The most common haplotype was C-G-C (f = 0.90) followed by T-A-C (f = 0.05), C-G-T (f = 0.02) and T-G-C (f = 0.02).
77% of African American and 91% European Americans in our study had CGC haplotype of HERV-K106 LTR. CGC was also the most prevalent in a heterogeneous sample labeled as ‘others’ that consisted of a few (n<5 from each group) individuals of Hispanic, Asian, and East Indian origins. Haplotypes TGC and TAC were present in 8% and 5% of African Americans and European Americans respectively whereas haplotype CGT was present in 5% of African Americans but was absent in European Americans. African Americans, although in lower numbers in our study, demonstrated greater haplotype diversity than European Americans which is consistent with the “out of Africa” hypothesis that HERV-K106 originated in Africa and migrated out of Africa with human migration.
Figure 3Comparison of haplotype frequencies between HERV K106 and HERV-K113.
Haplotypes and haplotype frequencies of HERV K113 and K106 from individuals in the same sample set are shown. Haplotypes and haplotype frequencies of K113 were obtained from our previous study [25]. Higher minor haplotype frequencies (MHFs) were observed for K113 (18%, 9%, 9%) compared to K106 (8%, 4%, 3%) even though the numbers of K113 LTRs sequenced was much smaller than that compared to K106. Higher K113 MHFs in a subset of our samples compared to that of K106 suggests that K106 integrated into the human genome much later than K106 such that MHFs of K106 have not had enough time to reach higher frequencies in global populations. This serves as an additional evidence for recent insertion integration time of HERV-K106.