| Literature DB >> 35330423 |
Sathishkumar Ramaswamy1, Ruchi Jain1, Maha El Naofal1, Nour Halabi1, Sawsan Yaslam1, Alan Taylor1, Ahmad Abou Tayoun1,2.
Abstract
Genetic variation in populations of Middle Eastern origin remains highly underrepresented in most comprehensive genomic databases. This underrepresentation hampers the functional annotation of the human genome and challenges accurate clinical variant interpretation. To highlight the importance of capturing genetic variation in the Middle East, we aggregated whole exome and genome sequencing data from 2116 individuals in the Middle East and established the Middle East Variation (MEV) database. Of the high-impact coding (missense and loss of function) variants in this database, 53% were absent from the most comprehensive Genome Aggregation Database (gnomAD), thus representing a unique Middle Eastern variation dataset which might directly impact clinical variant interpretation. We highlight 39 variants with minor allele frequency >1% in the MEV database that were previously reported as rare disease variants in ClinVar and the Human Gene Mutation Database (HGMD). Furthermore, the MEV database consisted of 281 putative homozygous loss of function (LoF) variants, or complete knockouts, of which 31.7% (89/281) were absent from gnomAD. This set represents either complete knockouts of 83 unique genes in reportedly healthy individuals, with implications regarding disease penetrance and expressivity, or might affect dispensable exons, thus refining the clinical annotation of those regions. Intriguingly, 24 of those genes have several clinically significant variants reported in ClinVar and/or HGMD. Our study shows that genetic variation in the Middle East improves functional annotation and clinical interpretation of the genome and emphasizes the need for expanding sequencing studies in the Middle East and other underrepresented populations.Entities:
Keywords: Middle East Variants; common variants; knockouts; whole exome sequencing; whole genome sequencing
Year: 2022 PMID: 35330423 PMCID: PMC8956070 DOI: 10.3390/jpm12030423
Source DB: PubMed Journal: J Pers Med ISSN: 2075-4426
Figure 1Samples used for this study. (A) Data from a total of 88 whole genomes and 2028 whole exomes from the Qatar and Greater Middle East (GME) studies were aggregated in this study; (B) ancestry distribution of samples from the Qatar dataset; (C) ancestry distribution of samples from the GME dataset. NWA, Northwest Africa; NEA, Northeast Africa; TP, Turkish Peninsula; SD, Syrian Desert; AP, Arabian Peninsula; PP, Persia and Pakistan.
Distribution of variants in MEV database.
| Total Variants | SNPs | Indels | |
|---|---|---|---|
| Total MEVs | 26,228,226 | 21,180,218 | 5,048,008 |
| Total coding variants | 600,987 | 534,287 | 66,700 |
| Unique coding variants * | 318,242 (53%) | 263,680 | 54,562 |
| Reported coding variants ** | 282,745 (47%) | 270,607 | 12,138 |
* Unique coding variants = Variants not reported in gnomAD 2.1.1. ** Reported coding variants = Variants reported at least once in gnomAD 2.1.1.
Figure 2Characterization of common Middle East disease variants (CMEDVs). (a) Percentage of CMDEVs (MEV MAF > 1%) and rare (MEV MAF < 1%) set represents variants DM, or P/LP, which are also rare (<1%), in gnomAD. (b) Effect of CMEDVs and total number of genes impacted by those variants and distribution of CMEDVs, which are reported at different star levels in ClinVar and HGMD. Number of variants = Number of variants that are Missense or LoFs, or are in HGMD and ClinVar. Number of genes = Number of genes in CMEDV.
Figure 3Characterization of high confidence knockouts (KOs) in the MEV database. (a) Distribution of unique (present in MEV database only) and reported (present in both MEV database and gnomAD) knockouts. (b) Effects of unique KOs variants. Number of variants = Number of stops gained, frameshift, splice acceptor, stop lost, and splice donor variants. (c) Distribution of unique KOs genes in different disease databases (ClinVar, HGMD, and OMIM). Number of variants = Number of variants in OMIM, CLinVar, and HGMD. Number of genes = Number of genes in Unique KOs.