| Literature DB >> 25115870 |
John C Chambers1, James Abbott2, Weihua Zhang3, Ernest Turro4, William R Scott5, Sian-Tsung Tan6, Uzma Afzal5, Saima Afaq5, Marie Loh5, Benjamin Lehne5, Paul O'Reilly5, Kyle J Gaulton7, Richard D Pearson7, Xinzhong Li8, Anita Lavery5, Jana Vandrovcova9, Mark N Wass2, Kathryn Miller10, Joban Sehmi6, Laticia Oozageer10, Ishminder K Kooner10, Abtehale Al-Hussaini5, Rebecca Mills10, Jagvir Grewal10, Vasileios Panoulas11, Alexandra M Lewin5, Korrinne Northwood9, Gurpreet S Wander12, Frank Geoghegan10, Yingrui Li13, Jun Wang13, Timothy J Aitman9, Mark I McCarthy14, James Scott15, Sarah Butcher2, Paul Elliott16, Jaspal S Kooner17.
Abstract
The genetic sequence variation of people from the Indian subcontinent who comprise one-quarter of the world's population, is not well described. We carried out whole genome sequencing of 168 South Asians, along with whole-exome sequencing of 147 South Asians to provide deeper characterisation of coding regions. We identify 12,962,155 autosomal sequence variants, including 2,946,861 new SNPs and 312,738 novel indels. This catalogue of SNPs and indels amongst South Asians provides the first comprehensive map of genetic variation in this major human population, and reveals evidence for selective pressures on genes involved in skin biology, metabolism, infection and immunity. Our results will accelerate the search for the genetic variants underlying susceptibility to disorders such as type-2 diabetes and cardiovascular disease which are highly prevalent amongst South Asians.Entities:
Mesh:
Year: 2014 PMID: 25115870 PMCID: PMC4130493 DOI: 10.1371/journal.pone.0102645
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Location of birth (1A) and principal components analysis (PCA, 1B) of the South Asians sequenced.
The PCA plots shows results for all South Asians in the LOLIPOP study (SA - All, red circles), for South Asians sequenced (SA - NGS, black dots) and for HapMap2 populations.
SNPs and indels identified by low-coverage whole-genome sequencing (WGS-4x) and whole exome sequencing (WES) amongst South Asians.
| Per sample | All samples | ||||
| WGS-4x | WES | WGS-4x | WES | WGS-4x & WES | |
| SNPs | 3,120,893 | 34,698 | 11,538,889 | 189,939 | 11,624,616 |
| Novel SNPs | 39,061 | 3,272 | 2,885,370 | 73,900 | 2,946,861 |
| nsSNP | 9,569 | 8,225 | 45,201 | 48,104 | 70,746 |
| Novel ns SNPs | 158 | 238 | 12,919 | 20,910 | 30,914 |
| Indels | 733,326 | 11,042 | 1,337,283 | 25,750 | 1,352,706 |
| Novel indels | 108,127 | 4,664 | 301,104 | 13,126 | 312,738 |
| Inframe/frameshift indels | 484 | 598 | 2,994 | 4,837 | 7,215 |
| Novel inframe/frameshift indels | 49 | 313 | 727 | 2,813 | 3,349 |
Figure 2Correlation between imputed and observed genotypes amongst South Asians, using phased or unphased genotypes from low coverage WGS, or using 1000 Genomes Project data.
Results are shown as mean r2 with genotypes observed from microarray data (2A) or high-coverage WGS (2B, WGS-28x).
Figure 3Enrichment for coding variants amongst autosomal SNPs stratified between South Asians and the 1000 Genome populations (3A) and for specific functional classes of SNPs amongst South Asians compared to Europeans (3B).
Enrichment is calculated compared to null hypothesis; P values are provided in Table S6 and Table S7 in .
Figure 4Enrichment for stratified genetic variants at genetic loci associated with respective phenotype in genome-wide association studies.
Inset the correlation between the enrichment for stratified SNPs at known genetic loci, and enrichment of stratified variants for SNPs associated with respective phenotype in genome-wide association studies. Further details are provided in Table S10 in .