| Literature DB >> 30184103 |
Ranajit Das1, Priyanka Upadhyai2.
Abstract
The inference of genomic ancestry using ancestry informative markers (AIMs) can be useful for a range of studies in evolutionary genetics, biomedical research, and forensic analyses. However, the determination of AIMs for highly admixed populations with complex ancestries has remained a formidable challenge. Given the immense genetic heterogeneity and unique population structure of the Indian subcontinent, here we sought to derive AIMs that would yield a cohesive and faithful understanding of South Asian genetic origins. To discern the most optimal strategy for extracting AIMs for South Asians we compared three commonly used AIMs-determining methods namely, Infocalc, FST, and Smart Principal Component Analysis with ADMIXTURE, using previously published whole genome data from the Indian subcontinent. Our findings suggest that the Infocalc approach is likely most suitable for delineation of South Asian AIMs. In particular, Infocalc-2,000 (N = 2,000) appeared as the most informative South Asian AIMs panel that recapitulated the finer structure within South Asian genomes with high degree of sensitivity and precision, whereas a negative control with an equivalent number of randomly selected markers when used to interrogate the South Asian populations, failed to do so. We discuss the utility of all approaches under evaluation for AIMs derivation and interpreting South Asian genomic ancestries. Notably, this is the first report of an AIMs panel for South Asian ancestry inference. Overall these findings may aid in developing cost-effective resources for large-scale demographic analyses and foster expansion of our knowledge of human origins and disease, in the South Asian context.Entities:
Mesh:
Year: 2018 PMID: 30184103 PMCID: PMC6143162 DOI: 10.1093/gbe/evy182
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
. 1.—Admixture analyses of data sets generated using most informative SNPs detected by Infocalc algorithm. Admixture plots depicting the ancestry components of South Asian genomes. (A) Admixture analysis of the CSS (N = 499,158); (B) Admixture analysis of Infocalc-10,000; (C) Admixture analysis of Infocalc-2,534; (D) Admixture analysis of Infocalc-2,000; (E) Admixture analysis of Infocalc-1,500; (F) Admixture analysis of Infocalc-1,000; and (G) Admixture analysis of Infocalc-500. Admixture proportions were generated through an unsupervised admixture analyses at K = 10 using ADMIXTURE v1.3 and plotted in R v3.2.3. Each individual is represented by a vertical line partitioned into colored segments whose lengths are proportional to the contributions of the ancestral components to the genome of the individual. Note that Nyshas are included among the ATB group.
. 2.—Box and whisker plots comparing the Euclidean distances between the admixture proportions of the South Asian genomes obtained using the CSS and candidate panels deduced using alternative AIMs determining approaches. The number of SNPs contained in each of the candidate panels illustrated has been indicated in the text. Note: Random-2,534 comprised of 2,534 randomly selected SNPs from the CSS and the Consensus-2,534 comprised of 2,534 SNPs that were detected by at least two out of the four AIMs-determining approaches under evaluation.
. 3.—PCA of South Asian genomes. PCA plots showing genetic differentiation among South Asian genomes. The candidate panels were generated using highly informative SNPs detected through the Infocalc algorithm. (A) PCA of the CSS (N = 499,158), where the X-axis (PC1) explained 39.7% variance, whereas the Y-axis (PC2) explained 24.2% variance of the data. (B) PCA of Infocalc-10,000, where the X-axis (PC1) explained 39.8% variance, whereas the Y-axis (PC2) explained 23.9% variance of the data. (C) PCA of Infocalc-2,534, where the X-axis (PC1) explained 39.8% variance, whereas the Y-axis (PC2) explained 23.8% variance of the data. (D) PCA of Infocalc-2,000, where the X-axis (PC1) explained 39.3% variance, whereas the Y-axis (PC2) explained 24.2% variance of the data. (E) PCA of Infocalc-1,500, where the X-axis (PC1) explained 39.6% variance, whereas the Y-axis (PC2) explained 24.3% variance of the data. (F) PCA of Infocalc-1,000, where the X-axis (PC1) explained 38.3% variance, whereas the Y-axis (PC2) explained 23.2% variance of the data. (G) PCA of Infocalc-500, where the X-axis (PC1) explained 36.7% variance, whereas the Y-axis (PC2) explained 23.1% variance of the data. Notable populations are marked with circles. In all four cases illustrated here, PCA was performed in PLINK v1.9 and the top four principal components (PCs) were extracted. Top two PCs (PC1 and PC2), explaining the highest variance of the data were plotted in R v3.2.3. **X-axis designates PC1 and Y-axis designates PC2.