| Literature DB >> 35899193 |
Ulykbek Kairov1, Askhat Molkenov1, Aigul Sharip1, Saule Rakhimova2, Madina Seidualy1, Arang Rhie3, Ulan Kozhamkulov2, Maxat Zhabagin2, Jong-Il Kim3, Joseph H Lee4, Joseph D Terwilliger5, Jeong-Sun Seo3, Zhaxybay Zhumadilov2,6, Ainur Akilzhanova2.
Abstract
Kazakhstan, the ninth-largest country in the world, is located along the Great Silk Road and connects Europe with Asia. Historically, its territory has been inhabited by nomadic tribes, and modern-day Kazakhstan is a multiethnic country with a dominant Kazakh population. We sequenced and analyzed the genomes of five ethnic Kazakhs at high coverage using the Illumina HiSeq2000 next-generation sequencing platform. The five Kazakhs yielded a total number of base pairs ranging from 87,308,581,400 to 107,526,741,301. On average, 99.06% were properly mapped. Based on the Het/Hom and Ti/Tv ratios, the quality of the genomic data ranged from 1.35 to 1.49 and from 2.07 to 2.08, respectively. Genetic variants were identified and annotated. Functional analysis of the genetic variants identified several variants that were associated with higher risks of metabolic and neurogenerative diseases. The present study showed high levels of genetic admixture of Kazakhs that were comparable to those of other Central Asians. These whole-genome sequence data of healthy Kazakhs could contribute significantly to biomedical studies of common diseases as their findings could allow better insight into the genotype-phenotype relations at the population level.Entities:
Keywords: Kazakh whole genomes; Kazakhstan; genome analysis; human genetics; next-generation sequencing; whole-genome sequence (WGS); whole-genome sequence analysis
Year: 2022 PMID: 35899193 PMCID: PMC9309552 DOI: 10.3389/fgene.2022.902804
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
Alignment statistics of whole-genome sequencing of Kazakh individuals.
| Parameter | KAZ_WG2 | KAZ_WG4 | KAZ_WG5 | KAZ_WG6 | KAZ_WG7 |
|---|---|---|---|---|---|
| In total reads | 1,064,621,201 | 987,933,678 | 899,102,501 | 864,441,400 | 927,234,150 |
| Duplicates | 118,659,323 | 442,049,149 | 202,996,427 | 141,226,504 | 131,307,919 |
| Mapped | 1,060,168,691 | 976,604,161 | 889,481,111 | 855,464,242 | 917,897,457 |
| Mapped (%) | 99.58% | 98.85% | 98.93% | 98.96% | 98.99% |
| Singletons | 2,363,888 | 8,184,471 | 7,569,803 | 6,525,956 | 7,505,472 |
| Singletons (%) | 0.22% | 0.83% | 0.84% | 0.75% | 0.81% |
| Throughput (bp) | 107,526,741,301 | 99,781,301,478 | 90,809,352,601 | 87,308,581,400 | 93,650,649,150 |
| Human Genome Fold Coverage (mapped) | 33.10 | 30.49 | 27.77 | 26.71 | 28.66 |
Total genetic variants identified in five Kazakh samples.
| Sample | SNV | Novel SNP | MNVs | Novel MNVs | Deletions | Novel deletions | Insertions | Novel insertions |
|---|---|---|---|---|---|---|---|---|
| KAZ_WG2 | 3,158,814 | 14,898 | 209,294 | 29,841 | 312,682 | 15,856 | 312,420 | 33,077 |
| KAZ_WG4 | 3,035,717 | 13,059 | 193,632 | 26,850 | 297,412 | 14,149 | 293,352 | 27,914 |
| KAZ_WG5 | 3,110,974 | 14,122 | 202,611 | 28,370 | 306,392 | 15,069 | 305,205 | 30,820 |
| KAZ_WG6 | 3,141,190 | 14,385 | 202,819 | 28,390 | 304,443 | 14,383 | 306,471 | 30,764 |
| KAZ_WG7 | 3,131,384 | 15,011 | 204,417 | 28,801 | 14,163 | 28,813 | 308,449 | 32,003 |
| AVERAGE | 3,115,615.8 | 14,295 | 202,554.6 | 28,450.4 | 247,018.4 | 17,654 | 305,179.4 | 30,915.6 |
FIGURE 1Principal component analysis (PCA) of Kazakh samples along with samples from worldwide populations from 1000G, HGDP, and Jorde genomic projects. (A) PCA plot of Kazakh samples and 1000 G project; (B) PCA plot of Kazakh samples with the HGDP project; (C) PCA plot of Kazakh samples with Eurasian populations from the HGDP and Jorde genomic projects. Kazakh samples are highlighted in blue rhombus. AFR-Africa, AMR-America, CAS—Central Asia, EAS-East Asia, EUR-Europe, MES—Middle East, and SAS-South Asia.
FIGURE 2ADMIXTURE plots for Kazakhs assuming a different number of ancestral populations. These figures were generated using genomic data from 3,805 individuals collected from the 1000G, HGDP, and Jorde genomic projects. ADMIXTURE analysis assumed ancestral populations (k) to be 5, 7, or 10. The samples were clustered according to their geographical location to represent their genetic structure. Particularly, Central Asians (i.e., Hazars, Kazakhs, Kyrgyzstan, and Uygurs) had comparable levels of genetic admixture when varying Ks were assumed. AFR-Africa, AMR-America, CAS—Central Asia, EAS-East Asia, EUR-Europe, MES—Middle East, and SAS-South Asia.