| Literature DB >> 19666991 |
Aashish R Jha1, Satish K Pillai, Vanessa A York, Elizabeth R Sharp, Emily C Storm, Douglas J Wachter, Jeffrey N Martin, Steven G Deeks, Michael G Rosenberg, Douglas F Nixon, Keith E Garrison.
Abstract
The human genome, human endogenous retroviruses (HERV), of which HERV-K113 and HERV-K115 are the only known full-length proviruses that are insertionally polymorphic. Although a handful of previously published papers have documented their prevalence in the global population; to date, there has been no report on their prevalence in the United States population. Here, we studied the geographic distribution of K113 and K115 among 156 HIV-1+ subjects from the United States, including African Americans, Hispanics, and Caucasians. In the individuals studied, we found higher insertion frequencies of K113 (21%) and K115 (35%) in African Americans compared with Caucasians (K113 9% and K115 6%) within the United States. We also report the presence of three single nucleotide polymorphism sites in the K113 5' long terminal repeats (LTRs) and four in the K115 5' LTR that together constituted four haplotypes for K113 and five haplotypes for K115. HERV insertion times can be estimated from the sequence differences between the 5' and 3' LTR of each insertion, but this dating method cannot be used with HERV-K115. We developed a method to estimate insertion times by applying coalescent inference to 5' LTR sequences within our study population and validated this approach using an independent estimate derived from the genetic distance between K113 5' and 3' LTR sequences. Using our method, we estimated the insertion dates of K113 and K115 to be a minimum of 800,000 and 1.1 million years ago, respectively. Both these insertion dates predate the emergence of anatomically modern Homo sapiens.Entities:
Mesh:
Year: 2009 PMID: 19666991 PMCID: PMC2760466 DOI: 10.1093/molbev/msp180
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
FGenome structures of K113 and K115 are shown. (A) The proviruses are flanked by a short duplicated host DNA also known as preintegration site. LTRs line the viral genomes both at 5′ and 3′ ends. The primer positions of the flanking primers (specific to human DNA sequences flanking the proviral insertions) and HERV-K–specific primer (annealing to HERV-K insertions between the 5′ LTR and gag gene) are also included. (B) LTR of K113 and K115 indicating loci for various regulatory regions. An additional 17 HERV-K sequences (supplementary table 1, Supplementary Material online) were used to validate the coordinates of the known regulatory regions within the 5′ LTR.
Summary of Previously Reported Frequencies of K113 and K115 in Global Populations
| K113 | K115 | ||||
| Geographic Region | Study Size ( | Insertion Frequency (%) | Study Size ( | Insertion Frequency (%) | Reference |
| Africa | |||||
| Unspecified | 25 | 20 | 25 | 20 | |
| Malawi | 60 | 27 | 54 | 30 | |
| Cote d'Ivoire | 64 | 19 | 60 | 43 | |
| Kenya | 50 | 20 | 50 | 28 | |
| Kenya | 46 | 36 | 50 | 22 | |
| Rwanda | 49 | 22 | 49 | 22 | |
| Cameroon | 16 | 9 | 16 | 21 | |
| Mean | 22 | 27 | |||
| Middle East | |||||
| Yemen | 50 | 8 | 56 | 7 | |
| Egypt | 43 | 5 | 45 | 13 | |
| Oman | 43 | 3 | 57 | 14 | |
| Mean | 5 | 11 | |||
| Europe | |||||
| Unspecified | 22 | 0 | 22 | 0 | |
| United Kingdom | 96 | 4 | 96 | 1 | |
| Galicia | 48 | 2 | 50 | 7 | |
| Basque | 50 | 2 | 49 | 5 | |
| Mean | |||||
| Asia | |||||
| Unspecified | 28 | 10 | 28 | 0 | |
| China | 44 | 12 | 42 | 3 | |
| Taiwan | 47 | 16 | 49 | 4 | |
| Japan | — | — | 359 | 9 | |
| Mean | 13 | 4 | |||
| Oceania | |||||
| Papua New Guinea | 26 | 0.09 | 26 | 0.2 | |
| Papua New Guinea | 54 | 0 | 52 | 0 | |
| Mean | 0.04 | 0.10 | |||
The specific countries were not specified.
Mean frequencies of K113 and K115 in each geographical area calculated from specified previous reports. Based on the previously published data, the average insertion frequency of K113 is 22% in Africans, 5% in Middle Easterners, 2% in Europeans, 13% in Asians, and <0.1% in Papua New Guineans. Similarly, average insertion frequency of K115 is 27% in Africans, 11% in Middle Easterners, 3% in Europeans, 4% in Asians, and 0.1% in Papua New Guineans.
Insertion Frequencies of K113 and K115 in Three Major Ethnicities in the United States
| K113 | K115 | ||||||
| Ethnicity | M/F/na | +Insertion | % | Homozygous ( | +Insertion | % | |
| Jacobi Cohort, New York | |||||||
| African Americans | 34/22/0 | 56 | 11 | 20 | 3 | 22 | 39 |
| Hispanics | 17/18/1 | 36 | 6 | 17 | 1 | 14 | 39 |
| Multiracial | — | 4 | 0 | 0 | 0 | 0 | 0 |
| Total | 96 | 17 | 18 | 4 | 36 | 38 | |
| Scope Cohort, San Francisco | |||||||
| African Americans | 16/3/0 | 19 | 5 | 26 | 1 | 4 | 21 |
| Caucasians | 25/7/0 | 32 | 3 | 9 | 1 | 2 | 6 |
| Multiracial | — | 9 | 2 | 22 | 0 | 2 | 22 |
| Total | 60 | 10 | 17 | 2 | 8 | 13 | |
| Combined (New York and San Francisco) | |||||||
| African Americans | 50/25/0 | 75 | 16 | 21 | 4 | 26 | 35 |
| Hispanics | 17/18/1 | 36 | 6 | 17 | 1 | 14 | 39 |
| Caucasians | 25/7/0 | 32 | 3 | 9 | 1 | 2 | 6 |
| Multiracial | — | 13 | 2 | 15 | 0 | 2 | 15 |
| Total | 156 | 27 | 17 | 6 | 44 | 28 | |
NOTE.—M/F/na male/female/not available. When both parents of a participant reported different ethnicities for themselves, the participants were considered multiracial. Frequencies of K113 and K115 in three major ethnicities from two geographical regions are shown. Both the insertions were more common in African Americans in both geographical regions. The frequency of K115 insertion was higher in African Americans from New York than those from San Francisco. There were a total of five individuals, mostly African Americans, homozygous for K113 insertions. None were homozygous for K115 insertions.
SNP in Various Positions of 5′ LTR of K113 and K115
| African Americans | Hispanics | Caucasians | |||||
| Position | Base | f | f | f | |||
| A. Frequencies of bases at various SNP positions in 5′ LTR of K113 | |||||||
| 174 | A | 8 | 0.89 | 3 | 0.60 | 2 | 1 |
| G | 1 | 0.11 | 2 | 0.40 | 0 | 0 | |
| 581 | T | 7 | 0.78 | 3 | 0.60 | 2 | 1 |
| C | 1 | 0.11 | 2 | 0.40 | 0 | 0 | |
| G | 1 | 0.11 | 0 | 0 | 0 | 0 | |
| 629 | C | 7 | 0.78 | 3 | 0.60 | 2 | 1 |
| T | 2 | 0.22 | 2 | 0.40 | 0 | 0 | |
| Total | 9 | 5 | 2 | ||||
| B. Frequencies of bases at various SNP positions in 5′ LTR of K115 | |||||||
| 268 | C | 9 | 0.56 | 8 | 0.89 | 1 | 1 |
| T | 7 | 0.44 | 1 | 0.11 | 0 | 0 | |
| 385 | A | 9 | 0.56 | 8 | 0.89 | 1 | 1 |
| C | 7 | 0.44 | 1 | 0.11 | 0 | 0 | |
| 410 | G | 14 | 0.88 | 8 | 0.89 | 1 | 1 |
| A | 2 | 0.13 | 0 | 0 | 0 | 0 | |
| 687 | A | 8 | 0.50 | 8 | 0.89 | 1 | 1 |
| C | 8 | 0.50 | 1 | 0.11 | 0 | 0 | |
| Total | 16 | 9 | 1 | ||||
NOTE.—(A) Three sites with SNP (174, 581, and 629) were observed in the 5′ LTR of K113. None of the Caucasians had any SNP in any of these three sites indicating all of them had the same allele of K113. Both African Americans and Hispanics had at least two SNPs in each locus. There was an additional base at position 581 in African Americans. Base frequencies at each base were biased toward one common base. (B) Four SNP sites (286, 385, 410, and 687) were identified in the 5′ LTR of K115. All the Caucasians had the same allele of K115. African Americans had two bases at each of the SNP sites with a similar base frequency except at position 410 in which base frequency was biased. Hispanics did show diversity at all three sites but did not have an SNP at position 410. All SNPs were numbered according to their position in GenBank sequences AY037928 for K113 and AY037929 for K115.
Haplotypes and Haplotype Frequencies of K113 and K115 Based on Variations in the 5′ LTR
| Polymorphic Loci on LTR | |||||||
| 174 | 581 | 629 | f | ||||
| A. Haplotypes of K113 in various ethnic groups in the United States | |||||||
| African Americans | |||||||
| 4 | A | T | C | 0.57 | |||
| 1 | A/G | C/T | C/T | 0.14 | |||
| 1 | A | T | T | 0.14 | |||
| 1 | A | G | C | 0.14 | |||
| Hispanics | |||||||
| 1 | A | T | C | 0.25 | |||
| 1 | A/G | C/T | C/T | 0.25 | |||
| 1 | A | T | T | 0.25 | |||
| 1 | G | C | C | 0.25 | |||
| Caucasians | |||||||
| 2 | A | T | C | 1 | |||
| 268 | 385 | 410 | 687 | f | |||
| B. Haplotypes of K115 in various ethnic groups in the United States | |||||||
| African Americans | |||||||
| 6 | C | A | G | A | 0.38 | ||
| 6 | T | C | G | C | 0.38 | ||
| 2 | C | A | A | A | 0.13 | ||
| 1 | C | C | G | G | 0.06 | ||
| 1 | T | A | G | C | 0.06 | ||
| Hispanics | |||||||
| C | A | G | A | 0.89 | |||
| T | C | G | C | 0.11 | |||
| Caucasians | |||||||
| 1 | C | A | G | A | 1 | ||
NOTE.—Four haplotypes were seen for K113 of which two were common in African Americans and Hispanics. Both African Americans and Hispanics also had one unique private allele for K113. Caucasians had only one haplotype. Five different haplotypes were also observed in K115. The most diverse ethnic group with all five haplotypes was African American. African Americans also had two private alleles. Hispanics had two different haplotypes of K115, whereas the Caucasians had only a single haplotype.
FML phylogeny of HERV-K 5′ LTR sequences including K113 and K115 haplotypes. Taxon names of all reference sequences include GenBank accession numbers. K113 and K115 sequences are highlighted in red and blue, respectively, and the number of times each haplotype was observed in this study is listed in taxon labels. Scale bar represents 1% genetic distance.