| Literature DB >> 31604968 |
Habiba S AlSafar1,2,3, Mariam Al-Ali1,2, Gihan Daw Elbait1, Mustafa H Al-Maini4, Dymitr Ruta5, Braulio Peramo6, Andreas Henschel1,7, Guan K Tay8,9,10,11,12.
Abstract
Whole Genome Sequencing (WGS) provides an in depth description of genome variation. In the era of large-scale population genome projects, the assembly of ethnic-specific genomes combined with mapping human reference genomes of underrepresented populations has improved the understanding of human diversity and disease associations. In this study, for the first time, whole genome sequences of two nationals of the United Arab Emirates (UAE) at >27X coverage are reported. The two Emirati individuals were predominantly of Central/South Asian ancestry. An in-house customized pipeline using BWA, Picard followed by the GATK tools to map the raw data from whole genome sequences of both individuals was used. A total of 3,994,521 variants (3,350,574 Single Nucleotide Polymorphisms (SNPs) and 643,947 indels) were identified for the first individual, the UAE S001 sample. A similar number of variants, 4,031,580 (3,373,501 SNPs and 658,079 indels), were identified for UAE S002. Variants that are associated with diabetes, hypertension, increased cholesterol levels, and obesity were also identified in these individuals. These Whole Genome Sequences has provided a starting point for constructing a UAE reference panel which will lead to improvements in the delivery of precision medicine, quality of life for affected individuals and a reduction in healthcare costs. The information compiled will likely lead to the identification of target genes that could potentially lead to the development of novel therapeutic modalities.Entities:
Mesh:
Year: 2019 PMID: 31604968 PMCID: PMC6789106 DOI: 10.1038/s41598-019-50876-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Principal component analysis and supervised admixture analysis representing the estimated ethnic background of UAE S001 and UAE S002 (with admixtrure ratios shown as pie charts) compared to other genotypes of other UAE citizens and those in the HGDP dataset.
Alignment statistics and genome coverage for UAE S001 and UAE S002.
| UAE S001 | UAE S002 | |
|---|---|---|
| Number of Reads | 851,448,838 | 839,072,541 |
| Number of Reads Mapped | 712,659,088 (83.7%) | 826,900,438 (98.5%) |
| Number of Reads Properly Paired | 712,659,088 (83.7%) | 826,900,438 (98.5%) |
| Number of Singletons | 857,112 (0.10%) | 387,602 (0.05%) |
| Mean Coverage | 27.0909X | 31.1866X |
| Fragment Sizes | 151 s | 152 s |
Summary of variants found in UAE S001 and UAE S002.
| UAE S001 | UAE S002 | ||
|---|---|---|---|
| Variants | Total | 3,994,521 | 4,031,580 |
| ‘true’ | 3,835,491 | 3,865,759 | |
| ‘not listed’ | 159,030 | 165,821 | |
| SNPs | Total | 3,350,574 | 3,373,501 |
| ‘true’ | 3,283,240 | 3,302,437 | |
| ‘not listed’ | 67,334 | 71,064 | |
| Indels | Total | 643,947 | 658,079 |
| ‘true’ | 552,251 | 563,322 | |
| ‘not listed’ | 91,696 | 94,757 | |
Homozygous and heterozygous (genome-wide vs autosomal) values of the total ‘true’ and ‘not listed’ variants for UAE S001 and UAE S002.
| Type | Homozygous | Homozygous | Heterozygous | Heterozygous | ||
|---|---|---|---|---|---|---|
| UAE S001 | SNPs | Total | 1,373,660 | 1,303,531 | 1,976,914 | 1,975,999 |
| ‘true’ | 1,369,168 | 1,300,862 | 1,914,072 | 1,913,565 | ||
| ‘not listed’ | 4,492 | 2,669 | 62,842 | 62,434 | ||
| Indels | Total | 272,501 | 256,506 | 371,446 | 367,697 | |
| ‘true’ | 240,743 | 227,983 | 311,508 | 311,192 | ||
| ‘not listed’ | 31,758 | 28,523 | 59,938 | 59,505 | ||
| Total variants | 1,646,161 | 1,560,037 | 2,348,360 | 2,343,696 | ||
| UAE S002 | SNPs | Total | 1,316,296 | 1,277,277 | 2,057,205 | 2,002,016 |
| ‘true’ | 1,313,685 | 1,274,770 | 1,988,752 | 1,936,550 | ||
| ‘not listed’ | 2,611 | 2,507 | 68,453 | 65,466 | ||
| Indels | Total | 260,036 | 250,472 | 398,043 | 385,473 | |
| ‘true’ | 230,835 | 222,856 | 332,487 | 322,863 | ||
| ‘not listed’ | 29,201 | 27,616 | 65,556 | 62,610 | ||
| Total variants | 1,576,332 | 1,527,749 | 2,455,248 | 2,387,489 | ||
Transition (Ts) and transversion (Tv) (genome-wide (gw) and autosomal (auto)) values for the ‘true’ and ‘not listed’ variants for UAE S001 and UAE S002.
| Type | Transitions (Ts) | Transitions (Ts) | Transversions (Tv) genome wide | Transversions (Tv) autosomal | Ts/Tv | |
|---|---|---|---|---|---|---|
| UAE S001 | ‘true’ | 2,212,013 | 2,167,261 | 1,068,928 | 1,047,128 | 2.069/2.070 |
| ‘not listed’ | 37,483 | 36,198 | 29,801 | 28,885 | 1.258/1.253 | |
| Total | 2,249,496 | 2,203,466 | 1,098,729 | 1,076,013 | 2.047/2.048 | |
| UAE S002 | ‘true’ | 2,224,741 | 2,164,997 | 1,075,220 | 1,046,288 | 2.069/2.069 |
| ‘not listed’ | 40,865 | 39,075 | 30,132 | 28,876 | 1.356/1.353 | |
| Total | 2,265,606 | 2,204,072 | 1,105,352 | 1,075,164 | 2.050/2.050 |
*Transition: the change of purine (two rings) to purine nucleotide or pyrimidine (one ring) to another pyrimidine; Transversion: the substitution of purine to pyrimidine nucleotide of vice versa.
Classification of the ‘true’ and ‘not listed’ genome variants in UAE S001 and UAE S002 samples based on their impact.
| Type | High | Low | Moderate | Modifier | ||
|---|---|---|---|---|---|---|
| UAE S001 | ‘true’ | Variants | 407 | 11,436 | 9,463 | 3,711,873 |
| SNPs | 260 | 11,436 | 9,341 | 3,260,023 | ||
| Indels | 147 | 0 | 122 | 451,850 | ||
| ‘not listed’ | Variants | 91 | 189 | 326 | 143,548 | |
| SNPs | 25 | 189 | 296 | 66,762 | ||
| Indels | 66 | 0 | 30 | 76,786 | ||
| Total variants | 498 | 11,625 | 9,789 | 3,855,421 | ||
| UAE S002 | ‘true’ | Variants | 400 | 11,561 | 9,392 | 3,739,005 |
| SNPs | 262 | 11,561 | 9,269 | 3,278,963 | ||
| Indels | 138 | 0 | 123 | 460,042 | ||
| ‘not listed’ | Variants | 79 | 233 | 380 | 149,511 | |
| SNPs | 24 | 233 | 250 | 70,405 | ||
| Indels | 55 | 0 | 30 | 79,106 | ||
| Total variants | 479 | 11,794 | 9,772 | 3,888,516 | ||
Classification of the ‘true’ and ‘not listed’ genome variants in the UAE S001 and UAE S002 samples based on their functional class.
| Type | Missense | Nonsense | Silent | None | ||
|---|---|---|---|---|---|---|
| UAE S001 | ‘true’ | Variants | 9,388 | 70 | 10,604 | 3,713,117 |
| SNPs | 9,388 | 70 | 10,604 | 3,260,998 | ||
| Indels | 0 | 0 | 0 | 452,119 | ||
| ‘not listed’ | Variants | 296 | 14 | 164 | 143,680 | |
| SNPs | 296 | 14 | 164 | 66,798 | ||
| Indels | 0 | 0 | 0 | 76,882 | ||
| Total variants | 9,684 | 84 | 10,768 | 3,856,797 | ||
| UAE S002 | ‘true’ | Variants | 9,316 | 63 | 10,734 | 3,740,245 |
| SNPs | 9,316 | 63 | 10,734 | 3,279,942 | ||
| Indels | 0 | 0 | 0 | 460,303 | ||
| ‘not listed’ | Variants | 352 | 12 | 206 | 149,633 | |
| SNPs | 352 | 12 | 206 | 70,442 | ||
| Indels | 0 | 0 | 0 | 79,191 | ||
| Total variants | 9,668 | 75 | 10,940 | 3,889,878 | ||
*SnpEff assigns a functional class to certain effects, in addition to an impact: Nonsense: assigned to point mutations that result in the creation of a new stop codon; Missense: assigned to point mutations that result in an amino acid change, but not a new stop codon; Silent: assigned to point mutations that result in a codon change, but not an amino acid change or new stop codon; None: assigned to all effects that don’t fall into any of the above categories (including all events larger than a point mutation).
Summary of the ‘true’ and ‘not listed’ genome variants for UAE S001 classified by type within the different genomic locations.
| TYPE | Total (UAE S001) | ‘true’ (UAE S001) | ‘not listed’ (UAE S001) | ||
|---|---|---|---|---|---|
| Variants | SNPs | Indels | SNPs | Indels | |
| Codon change plus codon deletion | 54 | 0 | 46 (24) | 0 | 8 (2) |
| Codon change plus codon insertion | 26 | 0 | 20 (9) | 0 | 6 (5) |
| Codon deletion | 23 | 0 | 17 (8) | 0 | 6 (1) |
| Codon insertion | 49 | 0 | 39 (8) | 0 | 10 (7) |
| Downstream | 149,159 | 119,500 (58,265) | 22,970 (14,553) | 2,806 (1,242) | 3,883 (3,035) |
| Exon | 6,560 | 5,860 (2,478) | 475 (251) | 149 (48) | 76 (55) |
| Frameshift | 154 | 0 | 96 (22) | 0 | 58 (17) |
| Intergenic | 2,038,588 | 1,686,497 (950,612) | 273,365 (187,586) | 32,321 (17,386) | 46,405 (38,390) |
| Intragenic | 3 | 0 | 3 (2) | 0 | 0 |
| Intron | 1,448,583 | 1,276,103 (598,957) | 124,733 (76,992) | 26,576 (12,380) | 21,171 (16,691) |
| Nonsynonymous coding | 9,637 | 9,341 (681) | 0 | 296 (28) | 0 |
| Nonsynonymous start | 1 | 1 | 0 | 0 | 0 |
| Splice site acceptor | 101 | 50 (12) | 40 (5) | 6 | 5 (2) |
| Splice site donor | 112 | 94 (23) | 11 (2) | 5 | 2 |
| Start gained | 856 | 831 (114) | 0 | 25 | 0 |
| Start lost | 22 | 22 (1) | 0 | 0 | 0 |
| Stop gained | 84 | 70 (7) | 0 | 14 (1) | 0 |
| Stop lost | 25 | 24 | 0 | 0 | 1 |
| Synonymous coding | 10,761 | 10,597 (493) | 0 | 164 (19) | 0 |
| Synonymous stop | 7 | 7 | 0 | 0 | 0 |
| Upstream | 117,150 | 141,568 (66,745) | 26,836 (17001) | 3,975 (1,691) | 4,771 (3,718) |
| Untranslated 3′ | 29,579 | 25,470 (5,244) | 3,040 (1,066) | 688 (135) | 381 (207) |
| Untranslated 5′ | 5,799 | 5,025 (654) | 428 (120) | 247 (55) | 99 (55) |
The numbers in brackets reflect the number of those variants located in poorly resolved regions (i.e. low complexity regions such as segmental duplications, rDNA chromosome arms, centromeric, telomeric, large retro-transposable elements and others that are provided by the UCSC Table Browser).
Summary of the ‘true’ and ‘not listed’ genome variants for UAE S002 classified by type within the different genomic locations.
| TYPE | Total (UAE S002) | ‘true’ (UAE S002) | ‘not listed’ (UAE S002) | ||
|---|---|---|---|---|---|
| Variants | SNPs | Indels | SNPs | Indels | |
| Codon change plus codon deletion | 56 | 0 | 48 (26) | 0 | 8 (4) |
| Codon change plus codon insertion | 30 | 0 | 21 (7) | 0 | 9 (3) |
| Codon deletion | 25 | 0 | 20 (12) | 0 | 5 (4) |
| Codon insertion | 42 | 0 | 34 (9) | 0 | 8 (7) |
| Downstream | 148,446 | 118,309 (56,772) | 23,306 (14,930) | 2,993 (1,284) | 3,838 (3,108) |
| Exon | 6,601 | 5,844 (2,450) | 496 (268) | 162 (47) | 100 (61) |
| Frameshift | 126 | 0 | 82 (24) | 0 | 44 (15) |
| Intergenic | 2,061,266 | 1,699,950 (953,338) | 279,427 (191,786) | 33,928 (17,788) | 47,961 (40,218) |
| Intragenic | 2 | 0 | 2 (1) | 0 | 0 |
| Intron | 1,459,683 | 1,283,402 (598,698) | 126,283 (78,210) | 28,294 (12,623) | 21,704 (16,945) |
| Nonsynonymous coding | 9,620 | 9,270 (633) | 0 | 350 (21) | 0 |
| Nonsynonymous start | 1 | 1 | 0 | 0 | 0 |
| Splice site acceptor | 115 | 59 (8) | 45 (8) | 7 | 4 |
| Splice site donor | 114 | 94 (19) | 10 (1) | 3 | 7 (4) |
| Start gained | 854 | 827 (109) | 0 | 27 (4) | 0 |
| Start lost | 24 | 22 | 0 | 2 | 0 |
| Stop gained | 76 | 63 (6) | 1 | 12 (2) | 0 |
| Stop lost | 24 | 24 (1) | 0 | 0 | 0 |
| Synonymous coding | 10,935 | 10,729 (472) | 0 | 206 (19) | 0 |
| Synonymous stop | 6 | 6 | 0 | 0 | 0 |
| Upstream | 176,928 | 140,927 (65,564) | 27,030 (17,176) | 3,993 (1,591) | 4,978 (3,965) |
| Untranslated 3′ | 29,851 | 25,599 (5,252) | 3,045 (1,090) | 807 (145) | 400 (215) |
| Untranslated 5′ | 5,739 | 4,932 (649) | 453 (134) | 229 (49) | 125 (68) |
The numbers in brackets reflect the number of those variants located in poorly resolved regions (i.e. low complexity regions such as segmental duplications, rDNA chromosome arms, centromeric, telomeric, large retro-transposable elements and others that are provided by the UCSC Table Browser).
Summary of listed or unlisted variants (with respect to GnomAD) for the UAE S001 and UAE S002, showing a significant increase in the true variants in comparison to dbSNP 138.
| TYPE | UAE S001 | UAE S002 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Total | ‘true’ | ‘not listed’ | Total | ‘true’ | ‘not listed’ | |||||
| Variants | SNPs | Indels | SNPs | Indels | Variants | SNPs | Indels | SNPs | Indels | |
| Codon change plus codon deletion | 54 | 0 | 52 | 0 | 2 | 56 | 0 | 53 | 0 | 3 |
| Codon change plus codon insertion | 26 | 0 | 25 | 0 | 1 | 30 | 0 | 28 | 0 | 2 |
| Codon deletion | 23 | 0 | 21 | 0 | 2 | 25 | 0 | 25 | 0 | 0 |
| Codon insertion | 49 | 0 | 48 | 0 | 1 | 42 | 0 | 40 | 0 | 2 |
| Downstream | 149,159 | 121,377 | 25,819 | 929 | 1,034 | 148,446 | 120,336 | 26,079 | 966 | 1,065 |
| Exon | 6,560 | 5,966 | 539 | 43 | 12 | 6,601 | 5,965 | 579 | 40 | 17 |
| Frameshift | 154 | 0 | 143 | 0 | 11 | 126 | 0 | 117 | 0 | 9 |
| Intergenic | 2,038,588 | 1,709,914 | 307,629 | 8,904 | 12,141 | 2,062,522 | 1,725,073 | 315,180 | 9,339 | 12,930 |
| Intragenic | 3 | 0 | 3 | 0 | 0 | 2 | 0 | 2 | 0 | 0 |
| Intron | 1,448,583 | 1,294,273 | 140,543 | 8,406 | 5,361 | 1,459,726 | 1,303,159 | 142,225 | 8,569 | 5,773 |
| Nonsynonymous coding | 9,637 | 9,503 | 0 | 134 | 0 | 9,620 | 9,489 | 0 | 131 | 0 |
| Nonsynonymous start | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
| Splice site acceptor | 101 | 53 | 45 | 3 | 0 | 115 | 61 | 49 | 5 | 0 |
| Splice site donor | 112 | 97 | 13 | 2 | 0 | 114 | 95 | 17 | 2 | 0 |
| Start gained | 856 | 849 | 0 | 7 | 0 | 854 | 851 | 0 | 3 | 0 |
| Start lost | 22 | 22 | 0 | 0 | 0 | 24 | 24 | 0 | 0 | 0 |
| Stop gained | 84 | 78 | 0 | 6 | 0 | 76 | 70 | 1 | 5 | 0 |
| Stop lost | 25 | 24 | 0 | 0 | 1 | 24 | 24 | 0 | 0 | 0 |
| Synonymous coding | 10,761 | 10,693 | 0 | 68 | 0 | 10,935 | 10,854 | 0 | 81 | 0 |
| Synonymous stop | 7 | 7 | 0 | 0 | 0 | 6 | 6 | 0 | 0 | 0 |
| Upstream | 117,150 | 144,105 | 30,293 | 1,438 | 1,314 | 176,930 | 143,564 | 30,586 | 1,358 | 1,422 |
| Untranslated 3′ | 29,579 | 25,885 | 3,340 | 273 | 81 | 29,874 | 26,164 | 3,355 | 263 | 92 |
| Untranslated 5′ | 5,799 | 5,167 | 513 | 105 | 14 | 5,742 | 5,074 | 553 | 90 | 25 |
Summary of the variants that are ‘not listed’ (with respect to GnomAD) for UAE S001 and UAE S002.
| Type | UAE S001 | UAE S002 | ||||
|---|---|---|---|---|---|---|
| Total | SNPs | Indels | Total | SNPS | Indels | |
| Frameshift | 11 | 0 | 11 | 9 | 0 | 9 |
| Exon | 55 | 43 | 12 | 57 | 40 | 17 |
| Codon change plus codon deletion | 2 | 0 | 2 | 3 | 0 | 3 |
| Codon change plus codon insertion | 1 | 0 | 1 | 2 | 0 | 2 |
| Codon deletion | 2 | 0 | 2 | 0 | 0 | 0 |
| Codon deletion | 2 | 0 | 2 | 0 | 0 | 0 |
| Intron | 13,767 | 8,406 | 5,361 | 14,342 | 8,569 | 5,773 |
| Nonsynonymous coding | 134 | 134 | 0 | 131 | 131 | 0 |
| Splice site acceptor | 3 | 3 | 0 | 5 | 5 | 0 |
| Splice site donor | 2 | 2 | 0 | 2 | 2 | 0 |
| Synonymous coding | 68 | 68 | 0 | 81 | 81 | 0 |
| Synonymous stop | 0 | 0 | 0 | 0 | 0 | 0 |
| Untranslated 3 prime | 354 | 273 | 81 | 355 | 263 | 92 |
| Untranslated 5 prime | 119 | 105 | 14 | 115 | 90 | 25 |
| Total | 14,520 | 15,102 | ||||
Figure 2A pipeline chart showing the number and types of variants in the UAE S001 and UAE S002 samples.
Figure 3Intergenome distances between genomes of UAE S001, UAE S002, Kuwaiti and individuals from the 51 populations in the HGDP.
Figure 4Venn diagram presents the intersections of known variants among UAE S001, UAE S002 and KWP1 (individual of Persian ancestry from Kuwait).