| Literature DB >> 28532386 |
Navin Rustagi1, Anbo Zhou2, W Scott Watkins3, Erika Gedvilaite2, Shuoguo Wang2, Naveen Ramesh1, Donna Muzny1, Richard A Gibbs1, Lynn B Jorde4, Fuli Yu5, Jinchuan Xing6.
Abstract
BACKGROUND: The cost of Whole Genome Sequencing (WGS) has decreased tremendously in recent years due to advances in next-generation sequencing technologies. Nevertheless, the cost of carrying out large-scale cohort studies using WGS is still daunting. Past simulation studies with coverage at ~2x have shown promise for using low coverage WGS in studies focused on variant discovery, association study replications, and population genomics characterization. However, the performance of low coverage WGS in populations with a complex history and no reference panel remains to be determined.Entities:
Keywords: Extremely low coverage; Imputation; Population structure; Single nucleotide variant; South Asian; Whole genome sequencing
Mesh:
Year: 2017 PMID: 28532386 PMCID: PMC5440948 DOI: 10.1186/s12864-017-3767-6
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Sequencing and variant calling statistics for SAS-AP samples
| Group | Populations | # of samples | Depth of Coverage | Total SNVs | Avg SNVs | Ti/Tv | Novel 1000G-SAS | Novel dbSNP 141 |
|---|---|---|---|---|---|---|---|---|
| aUpper | Brahmins | 16 | 1.81 | 4,443,583 | 2,571,972 | 2.09 | 97,022 | 5632 |
| aMiddle | Kapu | 37 | 1.57 | 4,457,414 | 2,582,833 | 2.09 | 97,303 | 5701 |
| aMiddle | Yadava | 32 | 1.65 | 4,457,237 | 2,575,646 | 2.09 | 97,300 | 5699 |
| aLower | Mala | 23 | 1.56 | 4,455,626 | 2,615,726 | 2.09 | 97,261 | 5701 |
| aLower | Madiga | 24 | 1.35 | 4,455,972 | 2,583,914 | 2.09 | 97,271 | 5699 |
| aLower | Relli | 15 | 1.69 | 4,455,972 | 2,586,868 | 2.09 | 96,908 | 5639 |
| Tribal | Irula | 22 | 1.86 | 4,439,932 | 2,570,388 | 2.09 | 97,024 | 5674 |
| Tribal | Khonda Dora | 16 | 1.82 | 4,403,761 | 2,574,472 | 2.09 | 96,340 | 5523 |
| Total | 185 | 1.64 | 4,457,475 | 2,583,004 | 2.09 | 97,309 | 5701 |
Only SNVs with minor allele frequency (MAF) ≥10% are included. Total SNVs: the total number of SNVs in a population. Avg SNVs: the average number of SNVs in an individual. Ti/Tv: transition/transversion SNV ratio. Novel 1000G-SAS: the number of SNVs that are not in the 1000G-SAS dataset. Novel dbSNP 141: the number of SNVs that are not in dbSNP 141. aCaste Populations
Fig. 1PCA of SAS-AP and 1000GP3 samples. Each symbol represents one individual. PC1 and PC2 are shown on the X and Y axis, respectively. The percentage of variance explained by each PC is labeled on the axis. The map shown in the figure is adapted from https://commons.wikimedia.org/wiki/File:World_map_blank_without_borders.svg where permission is granted under a creative commons license
Fig. 2PCA of South Asian samples. a All SAS-AP samples; b SAS-AP excluding Khonda Dora and Irula samples; c SAS-AP and 1000GP3-SAS samples; d SAS-AP and 1000GP3-SAS excluding Khonda Dora and Irula samples. Each symbol represents one individual. PC1 and PC2 are shown on the X and Y axis, respectively. The variance explained by each PC is labeled on the axis
Fig. 3Admixture analysis of SAS-AP and 1000GP3 samples. a K = 5; b K = 6. Each vertical bar represents one sample. The vertical bar is composed of colored sections, where each section represents the proportion of a sample’s ancestry derived from one of K ancestral populations
Fig. 4Imputation dosage correlation coefficient R2 of ~200,000 missing sites from SAS-AP samples. Results from the SAS-AP reference panel and the 1000GP3-SAS reference panel are shown in blue and red bars, respectively. The number of target samples is given in parenthesis
Fig. 5The proportion of major mtDNA haplogroups in castes and tribal populations. All haplogroups are defined using complete mtDNA sequences