| Literature DB >> 19779445 |
David Reich1, Kumarasamy Thangaraj, Nick Patterson, Alkes L Price, Lalji Singh.
Abstract
India has been underrepresented in genome-wide surveys of human variation. We analyse 25 diverse groups in India to provide strong evidence for two ancient populations, genetically divergent, that are ancestral to most Indians today. One, the 'Ancestral North Indians' (ANI), is genetically close to Middle Easterners, Central Asians, and Europeans, whereas the other, the 'Ancestral South Indians' (ASI), is as distinct from ANI and East Asians as they are from each other. By introducing methods that can estimate ancestry without accurate ancestral populations, we show that ANI ancestry ranges from 39-71% in most Indian groups, and is higher in traditionally upper caste and Indo-European speakers. Groups with only ASI ancestry may no longer exist in mainland India. However, the indigenous Andaman Islanders are unique in being ASI-related groups without ANI ancestry. Allele frequency differences between groups in India are larger than in Europe, reflecting strong founder effects whose signatures have been maintained for thousands of years owing to endogamy. We therefore predict that there will be an excess of recessive diseases in India, which should be possible to screen and map genetically.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19779445 PMCID: PMC2842210 DOI: 10.1038/nature08365
Source DB: PubMed Journal: Nature ISSN: 0028-0836 Impact factor: 49.962
25 groups sampled from 13 states of India
| Sampling location | Min FST to others | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Group | Samples | Language family | Traditional caste or social designation | State/Territory | Nearest large town city or island | Latitude / Longitude | Census size | Raw | Inbreeding corrected |
| Kashmiri Pandit | 5 | Indo-European | Upper caste | Kashmir | Dras | 34°22’ N / 75°50’ E | 7,000 | 0.0005 | 0.0023 |
| Vaish | 4 | Indo-European | Upper caste | Uttar Pradesh | Jaunpur | 25°46’ N / 82°44’ E | 25,000,000 | 0.0005 | 0.0020 |
| Srivastava | 2 | Indo-European | Upper caste | Uttar Pradesh | Mirzapur | 25°10’ N / 82°37’ E | 10,000,000 | 0.0029 | 0.0023 |
| Sahariya | 4 | Indo-European | Lower caste | Uttar Pradesh | Allahabad | 25°28’ N / 81°54’ E | 41,000 | 0.0089 | 0.0087 |
| Lodi | 5 | Indo-European | Lower caste | Uttar Pradesh | Jhansi | 26°45’ N / 83°24’ E | 57,000 | 0.0029 | 0.0028 |
| Satnami | 4 | Indo-European | Lower caste | Chhattisgarh | Raipur | 20°29’ N / 85°58’ E | 4,200,000 | 0.0038 | 0.0039 |
| Bhil | 7 | Indo-European | Tribal | Gujarat | Ahmedabad | 23°02’ N / 72°40’ E | 7,400,000 | 0.0022 | 0.0027 |
| Tharu | 9 | Indo-European | Tribal | Uttarkhand | Nainital | 29°23’ N / 79°30’ E | 96,000 | 0.0009 | 0.0017 |
| Meghawal | 5 | Indo-European | Lower caste | Rajasthan | Jodhpur | 26°18’ N / 73°04’ E | 890,000 | 0.0034 | 0.0048 |
| Vysya | 5 | Dravidian | Middle caste | Andhra Pradesh | Anantapur | 14°41’ N / 77°39’ E | 3,200,000 | 0.0108 | 0.0087 |
| Naidu | 4 | Dravidian | Upper caste | Andhra Pradesh | Chittoor | 13°13’ N / 79°06’ E | 19,000,000 | 0.0052 | 0.0022 |
| Velama | 4 | Dravidian | Upper caste | Andhra Pradesh | Mahboob Nagar | 16°31’ N / 75°51’ E | 13,000,000 | 0.0078 | 0.0038 |
| Madiga | 4 | Dravidian | Lower caste | Andhra Pradesh | Warangal | 17°58’ N / 79°35’ E | 1,600,000 | 0.0038 | 0.0028 |
| Mala | 3 | Dravidian | Lower caste | Andhra Pradesh | Hyderabad | 17°22’ N / 78°29’ E | 2,900,000 | 0.0038 | 0.0030 |
| Kamsali | 4 | Dravidian | Lower caste | Andhra Pradesh | Kurnool | 15°49’ N / 78°02’ E | 5,100,000 | 0.0055 | 0.0022 |
| Chenchu | 6 | Dravidian | Tribal | Andhra Pradesh | Anantapur | 17°22’ N / 78°28’ E | 28,000 | 0.0524 | 0.0536 |
| Kurumba | 9 | Dravidian | Tribal | Kerala | Palakkad | 10°54’ N / 76°27’ E | 1,300 | 0.0021 | 0.0017 |
| Hallaki | 7 | Dravidian | Tribal | Karnataka | Uttara Kannada | 13°55’ N / 74°09’ E | 75,000 | 0.0072 | 0.0045 |
| Santhal | 7 | Austro-Asiatic | Tribal | Jharkhand | Santhal Pargana | 24°30’ N / 87°30’ E | 2,100,000 | 0.0045 | 0.0057 |
| Kharia | 6 | Austro-Asiatic | Tribal | Madhya Pradesh | Raigarh | 23°08’ N / 73°07’ E | 6,900 | 0.0045 | 0.0057 |
| Nyshi | 4 | Tibeto-Burman | Tribal | Arunachal Pradesh | Papum Pare | 26°55’ N / 92°40’ E | 56,000 | 0.0215 | 0.0198 |
| Ao Naga | 4 | Tibeto-Burman | Tribal | Nagaland | Kohima | 25°40’ N / 94°08’ E | 105,000 | 0.0215 | 0.0198 |
| Siddi | 4 | Dravidian | Tribal | Karnataka | Dharwand | 15°27’ N / 75°05’ E | 25,000 | 0.0746 | 0.0757 |
| Onge | 9 | Jarawa-Onge | Hunter gatherer | Andaman & Nicobar | Little Andaman | 10°30’ N / 92°30’ E | 97 | 0.0905 | 0.0934 |
| Gr. Andamanese | 7 | Andamanese | Hunter gatherer | Andaman & Nicobar | Great Andaman | 12°12’ N / 93°00’ E | 42 | 0.0386 | 0.0414 |
The language of the Siddi is Dravidian, but their ancestors spoke a Bantu language.
Census estimates correspond to all of India.
Numbers are based on:
Singh KS (1994) People of India, National Series, Volume III, Scheduled Tribes. Oxford University Press, Oxford;
Singh KS (1993) People of India, National Series, Volume III, Scheduled Castes. Oxford University Press, Oxford.
For some groups (without a superscript) we obtained estimates from the Census of India 1991, Registrar General Office, Government of India.
Figure 1Map of India with the state of origin of the 25 groups that we studied.
Detection and quantification of population mixture along the Indian Cline
| Indian Cline group | Samples | Z-score from | % ANI ancestry | ±1 stand. error | Genetic drift D from the best fitting combination of ANI and ASI | Wright’s fixation index F (estimates inbreeding) | Estimated fraction of recessive diseases due to founder events |
|---|---|---|---|---|---|---|---|
| Mala | 3 | -2.5 | 38.8% | 1.2% | 0.0023 | 0 | 100% |
| Madiga | 4 | -2.7 | 40.6% | 1.2% | 0.0018 | 0.0061 | 23% |
| Chenchu | 6 | 31.3 (not significant) | 40.7% | 1.3% | 0.0492 | 0 | 100% |
| Bhil | 7 | -10.6 | 42.9% | 1.1% | 0.0024 | 0 | 100% |
| Satnami | 3 | -5.6 | 43.0% | 1.3% | 0.0019 | 0 | 100% |
| Kurumba | 6 | -12.6 | 43.2% | 1.1% | 0.0001 | 0.0052 | 2% |
| Kamsali | 3 | -6.5 | 44.5% | 1.3% | 0.0016 | 0.0066 | 19% |
| Vysya | 5 | 5.4 (not significant) | 46.2% | 1.2% | 0.0083 | 0.0071 | 54% |
| Lodi | 5 | -8.9 | 49.9% | 1.1% | 0.0027 | 0.0056 | 32% |
| Naidu | 4 | -3.3 | 50.1% | 1.2% | 0.0022 | 0.0435 | 5% |
| Tharu | 5 | -20.6 | 51.0% | 1.2% | 0.0000 | 0 | na |
| Velama | 4 | -3.2 | 54.7% | 1.3% | 0.0044 | 0.0197 | 18% |
| Srivastava | 2 | -7.5 | 56.4% | 1.5% | 0.0023 | 0 | 100% |
| Meghawal | 5 | -13.3 | 60.3% | 1.2% | 0.0035 | 0 | 100% |
| Vaish | 4 | -22.0 | 62.6% | 1.2% | 0.0012 | 0 | 100% |
| Kashmiri Pandit | 5 | -20.6 | 70.6% | 1.2% | 0.0019 | 0 | 100% |
| Sindhi | 10 | -26.3 | 73.7% | 1.1% | 0.0008 | 0.0043 | 16% |
| Pathan | 15 | -34.3 | 76.9% | 1.1% | 0.0001 | 0.0039 | 3% |
Estimates of genetic drift (the variance in allele frequencies on any lineage) are based on a model in which each group is a simple mixture of ANI and ASI, followed by subsequent genetic drift specific to that group (corrected for inbreeding). To fit the model, we use the algorithm described in Note S4, and fit f2, f3 and f4 statistics that are calculated in a way that is unbiased by inbreeding (Appendix).
Wright’s fixation index F is estimated as the excess rate at which the two copies of a chromosome within an individual from a group are identical by state, compared within across individuals from that group (Appendix). We set negative values to 0; standard errors are typically around 0.003. Because of the small sample sizes, these estimates are heavily influenced by the samples that happen to have been included in our analysis, and thus should be considered approximate.
To estimate the proportion of recessive disease cases that are due to founder events, we consider the two alleles that a single individual carries at any locus. With probability F given by Wright’s Fixation Index, they coalesce in the last few generations due to consanguinity, and with probability (1-F)D, they coalescence more recently than ANI-ASI mixture due to founder events specific to that group. The fraction of recessive diseases due to founder events can thus be estimated as D(1-F)/(F+D(1-F)).
Figure 3Principal components analysis (PCA) of 22 groups from the Indian subcontinent. Analysis of these groups along with Europeans (CEU) and Chinese (CHB) reveals a gradient of relatedness to CEU that runs through the majority of Indo-European and Dravidian groups, with the Kashmiri Pandit most related to CEU. Both the Austro-Asiatic speaking groups (Kharia and Santhal) and the tribal Sahariya are off-cline, while the two Tibeto-Burman speaking groups cluster with CHB. (Data from the outlying Siddi, Onge and Great Andamanese are not shown.)
Figure 4A model relating the history of Indian and non-Indian groups. Modeling the Pathan, Vaish, Meghawal and Bhil as mixtures of ANI and ASI, and relating them to non-Indians by the phylogenetic tree (YRI,(CEU,ANI),(ASI, Onge))), provides an excellent fit to the data. While the model is precise about tree topology and ordering of splits, it provides no information about population size changes or the timings of events. We estimate genetic drift on each lineage in the sense of variance in allele frequencies, which we rescale to be comparable to FST (standard errors are typically ±0.001 but are not shown).