| Literature DB >> 32183706 |
Shengqing Ma1, Gang Shi2.
Abstract
BACKGROUND: Population stratification is a known confounder of genome-wide association studies, as it can lead to false positive results. Principal component analysis (PCA) method is widely applied in the analysis of population structure with common variants. However, it is still unclear about the analysis performance when rare variants are used.Entities:
Keywords: Population stratification; Principal component analysis; Rare variant; Single nucleotide polymorphism
Mesh:
Year: 2020 PMID: 32183706 PMCID: PMC7077175 DOI: 10.1186/s12863-020-0833-x
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Summary of SNPs from the 1000 Genomes Project data
| Common SNP | Low-frequency SNP | Rare SNP | ||||
|---|---|---|---|---|---|---|
| MAF | (0.4,0.5] | (0.3,0.4] | (0.2,0.3] | (0.1,0.2] | (0.01,0.1] | (0. 0001,0.01] |
| Pop (N) | Bin 1 | Bin 2 | Bin 3 | Bin 4 | Bin 5 | Bin 6 |
| EUR (503) | 995,352 | 1,048,669 | 1,190,239 | 1,581,788 | 3,717,490 | 13,531,139 |
| EAS (504) | 970,359 | 1,010,549 | 1,130,961 | 1,440,178 | 2,982,582 | 14,189,976 |
| AMR (347) | 1,004,970 | 1,068,395 | 1,234,095 | 1,613,443 | 4,827,083 | 16,092,172 |
| SAS (489) | 1,001,330 | 1,077,620 | 1,239,727 | 1,626,183 | 3,989,855 | 15,562,799 |
| AFR (661) | 981,929 | 1,097,944 | 1,436,771 | 2,403,901 | 8,852,607 | 24,044,176 |
| Total (2504) | 1,023,570 | 1,105,365 | 1,308,728 | 1,824,157 | 6,770,457 | 65,247,586 |
MAF minor allele frequency, Pop population, EUR European, EAS East Asian, AMR American, SAS South Asian, AFR African
Theoretical and empirical values of the variance and covariance elements of EGRMs
| MAF | EUR | EAS | AMR | SAS | AFR | |
|---|---|---|---|---|---|---|
| (0.4,0.5] | 1.06/0.97 | 1.11/1.02 | 1.04/0.97 | 1.04/0.96 | 1.15/1.05 | |
| (0.3,0.4] | 1.06/0.97 | 1.11/1.02 | 1.04/0.97 | 1.04/0.96 | 1.14/1.05 | |
| (0.2,0.3] | 1.05/0.96 | 1.09/1.00 | 1.03/0.97 | 1.03/0.96 | 1.17/1.07 | |
| (0.1,0.2] | 0.99/0.91 | 1.01/0.93 | 0.98/0.93 | 0.99/0.92 | 1.32/1.22 | |
| (0.01,0.1] | 0.61/0.57 | 0.50/0.47 | 0.73/0.70 | 0.56/0.53 | 2.50/2.37 | |
| (0.0001,0.01] | 0.71/0.71 | 0.94/0.94 | 0.82/0.82 | 0.98/0.98 | 1.46/1.46 | |
| (0.4,0.5] | 0.13/0.11 | 0.22/0.20 | 0.08/0.07 | 0.08/0.07 | 0.30/0.28 | |
| (0.3,0.4] | 0.13/0.11 | 0.22/0.20 | 0.07/0.06 | 0.08/0.07 | 0.29/0.26 | |
| (0.2,0.3] | 0.12/0.11 | 0.21/0.19 | 0.07/0.06 | 0.08/0.07 | 0.27/0.24 | |
| (0.1,0.2] | 0.11/0.10 | 0.18/0.17 | 0.07/0.06 | 0.08/0.07 | 0.27/0.25 | |
| (0.01,0.1] | 0.06/0.05 | 0.08/0.07 | 0.04/0.03 | 0.05/0.04 | 0.25/0.23 | |
| (0.0001,0.01] | 0.004/0.002 | 0.005/0.003 | 0.004/0.002 | 0.005/0.003 | 0.011/0.008 |
The first values are theoretical values of the variance and covariance, and second values are empirical values
Fig. 1Scatter plots and representative points with SNPs from six MAF bins, PC 1 vs. PC 2. (a) 0.4 < MAF ≤ 0.5 (b) 0.3 < MAF ≤ 0.4 (c) 0.2 < MAF ≤ 0.3 (d) 0.1 < MAF ≤ 0.2 (e) 0.01 < MAF ≤ 0.1 (f) 0.0001 < MAF ≤ 0.01. EUR: European, EAS: East Asian, AMR: American, SAS: South Asian, AFR: African. The first values in brackets are the percentages of variance explained from the PCAs of GRMs; and the second values are from the PCAs of EGRMs. Large symbols in black are the representative points of the five populations
Fig. 2Scatter plots and representative points with SNPs from six MAF bins, PC 1 vs. PC 3. (a) 0.4 < MAF ≤ 0.5 (b) 0.3 < MAF ≤ 0.4 (c) 0.2 < MAF ≤ 0.3 (d) 0.1 < MAF ≤ 0.2 (e) 0.01 < MAF ≤ 0.1 (f) 0.0001 < MAF ≤ 0.01. EUR: European, EAS: East Asian, AMR: American, SAS: South Asian, AFR: African. The first values in brackets are the percentages of variance explained from the PCAs of GRMs; and the second values are from the PCAs of EGRMs. Large symbols in black are the representative points of the five populations