| Literature DB >> 15814064 |
Joanna L Mountain1, Uma Ramakrishnan.
Abstract
Summaries of human genomic variation shed light on human evolution and provide a framework for biomedical research. Variation is often summarised in terms of one or a few statistics (eg F(ST) and gene diversity). Now that multilocus genotypes for hundreds of autosomal loci are available for thousands of individuals, new approaches are applicable. Recently, trees of individuals and other clustering approaches have demonstrated the power of an individual-focused analysis. We propose analysing the distributions of genetic distances between individuals. Each distribution, or common ancestry profile (CAP), is unique to an individual, and does not require a priori assignment of individuals to populations. Here, we consider a range of models of population history and, using coalescent simulation, reveal the potential insights gained from a set of CAPs. Information lies in the shapes of individual profiles--sometimes captured by variance of individual CAPs--and the variation across profiles. Analysis of short tandem repeat genotype data for over 1,000 individuals from 52 populations is consistent with dramatic differences in population histories across human groups.Entities:
Mesh:
Year: 2005 PMID: 15814064 PMCID: PMC3525116 DOI: 10.1186/1479-7364-2-1-4
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Figure 2Summary of common ancestry profiles for 52 human populations. Mean genetic distance () among individuals within each of 52 human populations of the CEPH-HGDP panel, with range indicated by the 5th to 95th percentiles. Genetic distance of individuals x and y reflects the probability that two short tandem repeat alleles drawn, one from x and one from y at a particular locus, differ in state. The horizontal dotted line indicates average genetic distance (0.74) for all within-population comparisons.
Figure 1Common ancestry profiles for two individuals, based on genotype data for 377 short tandem repeat (STR) loci[4]. Distribution of genetic distance estimates for all possible pairs drawn from 1,013 individuals of the CEPH-HGDP STR dataset (overall) and for all pairs including individual Pima1043 or French 516. (a) Pima 1043 vs all other individuals; (b) French 516 vs all other individuals; (c) Pima 1043 vs three sets of individuals: other Pima, other non-Pima Americans and all non-Americans of CEPH-HGDP set; (d) French 516 vs three sets of individuals: other French, non-French Europeans and non-Europeans. Genetic distance for a pair of individuals is defined as the probability with which two alleles, one drawn randomly from each of the two individuals, differ in state, averaged across loci. Forty-three individuals (13 duplicates and 30 close relatives) excluded from original Rosenberg dataset [4].
List of pairs of CEPH-HGDP samples [30] determined via common ancestry profile analysis of short tandem repeat data [4] to be duplicates
| Duplicate pair | 1 st sample ID | 2nd sample ID | Population ID(s) | Population name(s) |
|---|---|---|---|---|
| 1 | 1022 | 813 | 601 | Han |
| 2 | 1235 | 1233 | 608 | Hezhen |
| 3 | 1025 | 762 | 684 | Japanese |
| 4 | 220 | 111 | 58/54 | Pathan, Hazara |
| 5 | 1154 | 149 | 27 | Italian-Bergamo |
| 6 | 589 | 583 | 37 | Druze |
| 7 | 652 | 650 | 36 | Bedouin |
| 8 | 659 | 658 | 71 | Melanesian |
| 9 | 826 | 657 | 71 | Melanesian |
| 10 | 979 | 660 | 71 | Melanesian |
| 11 | 981 | 472 | 488 | Biaka |
| 12 | 1087 | 452 | 488 | Biaka |
| 13 | 1092 | 457 | 488 | Biaka |
Population identification codes (IDs) drawn from Noah Rosenberg http://www.cmb.usc.edu/people/noahr//diversitycodes.txt.
List of individuals removed from analysis because of known close relationship (within two degrees) to another individual included in CEPH-HGDP short tandem repeat dataset [4,30]
| Sample | Population | Population | Country |
|---|---|---|---|
| 995 | 82 | Karitiana | Brazil |
| 998 | 82 | Karitiana | Brazil |
| 999 | 82 | Karitiana | Brazil |
| 1004 | 82 | Karitiana | Brazil |
| 1006 | 82 | Karitiana | Brazil |
| 1008 | 82 | Karitiana | Brazil |
| 1011 | 82 | Karitiana | Brazil |
| 1012 | 82 | Karitiana | Brazil |
| 1014 | 82 | Karitiana | Brazil |
| 1016 | 82 | Karitiana | Brazil |
| 1017 | 82 | Karitiana | Brazil |
| 1018 | 82 | Karitiana | Brazil |
| 830 | 83 | Surui | Brazil |
| 833 | 83 | Surui | Brazil |
| 839 | 83 | Surui | Brazil |
| 840 | 83 | Surui | Brazil |
| 841 | 83 | Surui | Brazil |
| 842 | 83 | Surui | Brazil |
| 850 | 83 | Surui | Brazil |
| 858 | 83 | Surui | Brazil |
| 878 | 86 | Maya | Mexico |
| 1039 | 87 | Pima | Mexico |
| 1040 | 87 | Pima | Mexico |
| 1042 | 87 | Pima | Mexico |
| 1045 | 87 | Pima | Mexico |
| 1046 | 87 | Pima | Mexico |
| 1049 | 87 | Pima | Mexico |
| 1050 | 87 | Pima | Mexico |
| 1055 | 87 | Pima | Mexico |
| 1061 | 87 | Pima | Mexico |
Additional pairs of individuals indicated as possible close relatives by common ancestry profile analysis were not removed. Population identification codes (IDs) drawn from Noah Rosenberg http://www.cmb.usc.edu/people/noahr//diversitycodes.txt.
Figure 3Ten examples each of simulated common ancestry profiles (CAPs) comparing an individual to: (a) all other individuals in two populations ('overall'); (b) all other individuals in the same population ('between'); and (c) all others in a different population ('within'). CAPs derived from coalescent simulations of two populations of effective size 1,000 that diverged 2,000 generations ago, generating 500 short tandem repeat loci (mutation rate: 0.0005/locus/generation; range constraint: 15, stepwise mutation model).
Summaries of individual common ancestry profiles (CAPs) derived from data simulated via two-population models
| Pairsa | Timeb | nc | Averaged | Standard deviatione | Htf | Raggednessg |
|---|---|---|---|---|---|---|
| Overall | 5,000 | 100 | 0.508 (0.007) | 0.045 (0.004) | 0.699 | 0.043 |
| 2,000 | 100 | 0.501 (0.007) | 0.036 (0.003) | 0.672 | 0.046 | |
| 1,000 | 100 | 0.497 (0.007) | 0.024 (0.003) | 0.664 | 0.039 | |
| 2,000 | 25 | 0.489 (0.008) | 0.043 (0.005) | 0.640 | 0.084 | |
| Within population | 5,000 | 100 | 0.470 (0.007) | 0.011 (0.001) | 0.568 | 0.125 |
| 2,000 | 100 | 0.469 (0.006) | 0.010 (0.001) | 0.568 | 0.133 | |
| 1,000 | 100 | 0.479 (0.006) | 0.009 (0.001) | 0.572 | 0.125 | |
| 2,000 | 25 | 0.465 (0.007) | 0.010 (0.001) | 0.552 | 0.208 | |
| Between population | 5,000 | 100 | 0.553 (0.009) | 0.017 (0.001) | 0.050 | |
| 2,000 | 100 | 0.537 (0.009) | 0.015 (0.001) | 0.056 | ||
| 1,000 | 100 | 0.522 (0.008) | 0.013 (0.001) | 0.074 | ||
| 2,000 | 25 | 0.538 (0.010) | 0.014 (0.002) | 0.134 | ||
| Cryptic | 5,000 | 100 | 0.510 (0.008) | 0.044 (0.004) | 0.693 | 0.056 |
| 2,000 | 100 | 0.504 (0.008) | 0.031 (0.003) | 0.671 | 0.057 | |
| 1,000 | 100 | 0.500 (0.007) | 0.024 (0.003) | 0.648 | 0.055 | |
| 2,000 | 25 | 0.498 (0.009) | 0.041 (0.005) | 0.653 | 0.104 |
Effective population size: 1,000 individuals per population. Statistics calculated across all individuals of simulated sample. Standard deviations included in parentheses.
a Overall -- all individuals of two-population sample compared; Within -- individuals of same population compared; Between -- individuals of different populations compared; Cryptic -- random subset of 100 individuals compared.
b Number of generations since two populations diverged.
c Number of individuals sampled per population.
d Average genetic distance for a set of pairs of individuals; standard deviation reflects variation in that average across individuals.
e Standard deviation of individual CAPs, averaged across individuals.
f Heterozygosity = 1 - ∑pi where pi is the population frequency of allele i. Averaged across loci.
g Raggedness calculated according to Harpending [37].
Impact of gene flow on individual common ancestry profiles (CAPs) derived from coalescent simulations
| Migration modela | Average b | Standard deviationc | Htd | Raggednesse | Peak 1f | Peak2f | Peak 3f |
|---|---|---|---|---|---|---|---|
| Nem = 0 | 0.504 (0.008) | 0.031 (0.003) | 0.671 | 0.057 | 0.086 | 0.039 | 0.096 |
| Nem = 0.5 | 0.514 (0.010) | 0.027 (0.007) | 0.663 | 0.059 | 0.092 | 0.081 | 0.114 |
| Nem = 2.0 | 0.521 (0.013) | 0.018 (0.008) | 0.642 | 0.083 | 0.082 | 0.142 | 0.107 |
| CIRM | 0.509 (0.008) | 0.034 (0.004) | 0.667 | 0.059 | 0.116 | 0.038 | 0.110 |
Time of divergence of two populations -- 2,000 generations; effective population size (Ne) -- 1,000 individuals; sample size -- 100 individuals per population. Standard deviations included in parentheses.
a Rate of migration from population 2 into population 1. CIRM: complete isolation (1,900 generations) followed by recent migration (Nem = 2.0).
b Average genetic distance for a set of pairs of individuals.
c Standard deviation of individual CAPs, averaged across individuals.
d Heterozygosity = 1 - ∑pi where pi is the population frequency of allele i, averaged across all alleles at all loci.
e Raggedness calculated according to Harpending [37].
f Average weight of distribution (across individuals) in each of three sets of bins corresponding to peak at lower genetic distance (1), peak at higher genetic distance (3) and midpoint between these two peaks (2). See text for further details.
Figure 4Ten examples of common ancestry profiles (CAPs) generated under each of four models of population history. Each 'cryptic' comparison set is based on 100 samples randomly selected from 200 possible samples in both populations, as might be realistic in the case of cryptic population structure. CAPs derived from coalescent simulations of two populations of effective size 1,000 that diverged 2,000 generations ago given: (a) complete isolation; (b) continuous gene flow at the rate of 0.5 migrants per generation; (c) continuous gene flow at the rate of 2.0 migrants per generation; and (d) gene flow over the past 100 generations at the rate of 2.0 migrant per generation, following 1,900 generations of isolation. Gene flow is asymmetrical. CAPs derived from simulated data for 500 short tandem repeat loci (mutation rate: 0.0005/locus/generation, range constraint: 15, stepwise mutation model).
Maximum genetic distance () between any pair of individuals drawn from each pair of geographical regions
| Africa | Mid East | Eur | C/S Asia | EAsia | Oc | Amer | |
|---|---|---|---|---|---|---|---|
| Africa | 0.846 | 0.853 | 0.853 | 0.853 | 0.853 | 0.847 | 0.861 |
| Middle East | 0.823 | 0.820 | 0.828 | 0.829 | 0.823 | 0.824 | |
| Europe | 0.803 | 0.812 | 0.822 | 0.823 | 0.823 | ||
| Central/South Asia | 0.811 | 0.818 | 0.821 | 0.822 | |||
| East Asia | 0.786 | 0.817 | 0.805 | ||||
| Oceania | 0.760 | 0.803 | |||||
| Americas | 0.749 |
Figure 5Common ancestry profiles (CAPs) for four individuals in the context of four pairs of populations, including geographically proximate populations, (a) Surui/Karitiana and (b) Burusho/Kalash, and geographically distant populations, (c) Pima/Mbuti and (d) Papuan/Biaka. Each figure illustrates a 'within', 'between' and 'overall' CAP for a focal individual. For example, the Surui/Karatiana comparison illustrates: (1) a Surui individual versus other Surui; (2) a Surui individual versus all Karitiana individuals; and (3) a Surui individual versus all Karitiana and all other Surui individuals.
Summaries of common ancestry profiles (CAPs) for four population pairs
| Surui vs Karitiana n1 = 14, n2 = 12 | Average | Standard deviation | Ht | Raggedness |
|---|---|---|---|---|
| Surui | 0.430 (0.011) | 0.017 (0.003) | 0.492 | 0.106 |
| Karitiana | 0.475 (0.013) | 0.016 (0.004) | 0.553 | 0.148 |
| Surui vs Karitiana | 0.492 (0.096) | 0.015 (0.004) | 0.625 | |
| Surui and Karitiana | 0.453 (0.015) | 0.033 (0.007) | 0.586 | 0.080 |
| Burusho, Kalash n1 = 25, n2 = 25 | ||||
| Burusho | 0.577 (0.008) | 0.014 (0.002) | 0.703 | 0.263 |
| Kalash | 0.599 (0.008) | 0.013 (0.002) | 0.732 | 0.277 |
| Burusho vs Kalash | 0.598 (0.008) | 0.013 (0.002) | 0.128 | |
| Burusho and Kalash | 0.582 (0.009) | 0.019 (0.002) | 0.736 | 0.089 |
| Pima, Mbuti n1 = 16, n2 = 15 | ||||
| Pima | 0.513 (0.008) | 0.014 (0.004) | 0.603 | 0.097 |
| Mbuti | 0.611 (0.004) | 0.012 (0.002) | 0.739 | 0.163 |
| Pima vs Mbuti | 0.608 (0.004) | 0.015 (0.003) | 0.071 | |
| Pima and Mbuti | 0.565 (0.025) | 0.040 (0.015) | 0.740 | 0.042 |
| Papuan, Biaka n1 = 33, n2 = 17 | ||||
| Papuan | 0.547 (0.008) | 0.014 (0.002) | 0.673 | 0.117 |
| Biaka | 0.614 (0.008) | 0.016 (0.003) | 0.759 | 0.086 |
| Biaka vs Papuan | 0.605 (0.009) | 0.015 (0.003) | 0.086 | |
| Biaka and Papuan | 0.591 (0.014) | 0.025 (0.007) | 0.768 | 0.048 |
Surui/Karitiana (Rondonia, Brazil) and Burusho/Kalash (Pakistan) are pairs of geographically proximate populations. Pima (North America)/Mbuti (Central Africa) and Papuan (Oceania)/Biaka (Central Africa) are pairs of geographically distant populations. Standard deviations are included in parentheses. For the 'between' and 'overall' comparisons, focal individuals are always drawn from the first population (ie Surui, Burusho, Pima and Papuan, respectively). Short tandem repeat data drawn from Rosenberg et al. [4]