| Literature DB >> 26490036 |
Khalid A Fakhro1,2, Noha A Yousri3,4, Juan L Rodriguez-Flores5, Amal Robay6, Michelle R Staudt7, Francisco Agosto-Perez8, Jacqueline Salit9, Joel A Malek10, Karsten Suhre11, Amin Jayyousi12, Mahmoud Zirie13, Dora Stadler14, Jason G Mezey15,16, Ronald G Crystal17,18.
Abstract
BACKGROUND: The populations of the Arabian Peninsula remain the least represented in public genetic databases, both in terms of single nucleotide variants and of larger genomic mutations. We present the first high-resolution copy number variation (CNV) map for a Gulf Arab population, using a hybrid approach that integrates array genotyping intensity data and next-generation sequencing reads to call CNVs in the Qatari population.Entities:
Mesh:
Year: 2015 PMID: 26490036 PMCID: PMC4618522 DOI: 10.1186/s12864-015-1991-5
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1CNV analysis strategy. CNV detection in Qataris was assessed at two tiers. First, CNVs were called in 100 individuals using two algorithms each, on two primary input datasets: genotyping array (OMNI2.5 M) and next-generation whole genome sequencing reads (Illumina PE 100 bp, Mean Depth: 37X). A size cut-off of at least 5 consecutive probes for genotyping data and at least 5 consecutive windows for whole genome sequencing data was used to increase specificity (see Methods). Three samples with an unusually high number of CNVs were first removed from the population (see Additional file 1: Figure S1). In the second step, high-quality CNVs from the remaining 97 subjects called by all 4 platforms were distributed into 97 individual files. CNVs were first compared intra-individuals and retained if observed by more than one algorithm. If no overlap was detected within the individual, the CNV was compared inter-individuals to detect a second occurrence in the remaining 97 individuals. CNVs observed only once in the entire sample were discarded. CNVs passing these filters were merged across the population to generate population level CNV regions (CNVRs), which were taken into the detailed analysis steps. *Denotes data was provided as-is from proprietary Illumina Genome Network sequencing pipeline without the ability of the user to alter parameters
Copy Number Variations in the Qatari Populationa
| Parameter | Total by parameter | Homozygous deletions (CN 0) | Heterozygous deletions (CN 1) | Total deletions | Duplications (CN 3) | Amplifications (CN 4+) | Total duplications/amplifications | Total polymorphic CNVRs | Total size of all non-overlapping CNVRs in the subpopulation |
|---|---|---|---|---|---|---|---|---|---|
| Array Data | |||||||||
|
| 56,135 | 7,435 | 23,497 | 30,932 | 16,767 | 8,436 | 25,203 | - | - |
|
| 16,895 | 1,026 | 3,906 | 4,932 | 9,527 | 2,436 | 11,963 | - | - |
| Sequencing Data | |||||||||
|
| 100,026 | 2,097 | 32,098 | 34,195 | 50,080 | 15,751 | 65,831 | - | - |
|
| 363,833 | - | 49,177 | 49,177 | 213,435 | 101,221 | 314,656 | - | - |
| Total CNVs by CN Class | 536,889 | 10,558 | 108,678 | 119,236 | 289,809 | 127,844 | 417,653 | - | - |
| CNVs per individual | |||||||||
|
| 1,824 | 120 | 628 | 748 | 801 | 275 | 1,076 | - | - |
|
| 1,815 | 121 | 622 | 743 | 801 | 271 | 1,072 | - | - |
|
| 29,934,170 | 1,131,273 | 5,928,199 | 7,059,472 | 18,400,102 | 4,474,596 | 22,874,698 | - | - |
|
| 27,911,587 | 1,087,616 | 5,787,942 | 6,875,558 | 16,889,655 | 4,146,374 | 21,036,029 | - | - |
| CNV Regions (CNVRs) by genetic subpopulation | |||||||||
|
| 5,241 | 149 | 2,534 | 2,683 | 1,480 | 270 | 1,750 | 808 | 85,705,083 |
|
| 4,176 | 116 | 1,909 | 2,025 | 1,242 | 273 | 1,515 | 636 | 65,814,099 |
|
| 4,641 | 101 | 2,283 | 2,384 | 1,316 | 304 | 1,620 | 637 | 65,851,402 |
| Total across subpopulations | 14,058 | 366 | 6,726 | 7,092 | 4,038 | 847 | 4,885 | 2,081 | - |
| Average size of CNVRs within each class | 15,462 | 4,187 | 8,604 | 8,376 | 20,101 | 10,669 | 18,457 | 32,561 | - |
aFour different algorithms were applied to detect CNVs in 97 individuals. For analysis of the Illumina Omni2.5 M Array Data, QuantiSNP (QSNP) [27] and Illumina’s cnvPartition (CNVPart) were used; for next-generation-sequencing (NGS) genomic data, cn.MOPS (CNMOPS) [28] was used with additional CNV calls provided by Illumina’s genome-sequencing service (IL-NGS). Shown are the numbers of CNVs detected by each algorithm in each copy number class, along with the total number of CNVs detected by copy number (CN) class and by CNV platform. CN (Copy number) class 0 = homozygous deletions; CN 1 = heterozygous deletions; CN 3 = single-allele duplication; CN 4 + = amplifications. Total deletions and duplications are a sum of CN classes 0, 1 and 3, 4+, respectively. Total CNVs and size are shown by platform and by class. As expected, array-based methods generated fewer but larger CNVs, whereas NGS based methods generated more but, on the average, smaller CNVs. The number of CNVs per individual is shown for the average and median individual amongst 97 individuals who passed the QC. CNV counts are shown by CN class. Additionally, the size of genomic content that is altered by CNVs in each CN class in the average and median individuals are provided. As described in Methods, these CNVs were merged across individuals within the same subpopulation to arrive at subpopulation level CNV Regions (CNVRs). The number of CNVRs within each subpopulation is given for each CN class, and the size of the average CNVR within each class is also shown. Within a population, there are sites that sometimes contain both deletions and duplications in different individuals; these are tallied in a column labeled ‘polymorphic’ CNVRs and represent about 15 % of all CNVRs within a given population. Finally, the total size of all non-overlapping CNV regions within each subpopulation is shown in the last column. The 3 genetic subpopulations are Q1 (Bedouin ancestry, n = 57), Q2 (Persian/South Asian ancestry, n = 20), and Q3 (African ancestry, n = 20)
Fig. 2Probability distributions of CNVs by frequency and size in each copy number class in 97 Qataris. Density curves showing the probability (y-axis) of a given individual from each of the 3 subpopulations having a certain number of CNVs (a-d) or a certain cumulative size of the genome affected by CNVs (e-h) in each copy number class (a, e. CN = 0; b, f. CN = 1; c, g. CN = 3; d, h. CN = 4+). All p -values are calculated using the ANOVA-Tukey method. Black trace – Q1, Blue trace - Q2, Red trace – Q3
Functional Annotation of CNV Regions in the Qatari Genetic Subpopulationsa
| Total number | Nongenic | Genic | miRNA | Genic affecting Mendelian disease genes | # Mendelian genes affected | Overlap DGV | Novel | Novel genic | Promoter site | Transcription factor binding site | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Q1 – | 0 – homozygous deletions | 341 | 228 | 113 | 2 | 8 | 8 | 322 | 19 | 7 | 20 | 127 |
| 1 – heterozygous deletions | 3,316 | 2,151 | 1,165 | 19 | 76 | 83 | 3,091 | 225 | 97 | 248 | 1,321 | |
| 3 – duplication | 2,161 | 1,121 | 1,040 | 196 | 182 | 223 | 2,045 | 116 | 48 | 509 | 980 | |
| 4 – amplification | 463 | 290 | 173 | 12 | 18 | 18 | 445 | 18 | 5 | 54 | 145 | |
| Sub-total Q1 | 6,281 | 3,790 | 2,491 | 229 | 284 | 332 | 5,903 | 378 | 157 | 831 | 2,573 | |
| Q2 – | 0 – homozygous deletions | 293 | 183 | 110 | 1 | 11 | 11 | 287 | 6 | 1 | 14 | 109 |
| 1 – heterozygous deletions | 2,470 | 1,598 | 872 | 15 | 65 | 70 | 2,350 | 120 | 53 | 172 | 879 | |
| 3 – duplication | 1,760 | 913 | 847 | 156 | 124 | 157 | 1,694 | 66 | 28 | 409 | 748 | |
| 4 – amplification | 434 | 268 | 166 | 11 | 17 | 17 | 422 | 12 | 5 | 52 | 143 | |
| Sub-total Q2 | 4,957 | 2,962 | 1,995 | 183 | 217 | 255 | 4,753 | 204 | 87 | 647 | 1,879 | |
| Q3 – | 0 – homozygous deletions | 267 | 174 | 93 | 0 | 18 | 8 | 262 | 5 | 2 | 16 | 101 |
| 1 – heterozygous deletions | 2,858 | 1,902 | 956 | 13 | 61 | 65 | 2,726 | 132 | 45 | 176 | 1,046 | |
| 3 – duplication | 1,835 | 977 | 858 | 148 | 122 | 141 | 1,754 | 81 | 29 | 407 | 772 | |
| 4 – amplification | 462 | 284 | 178 | 19 | 23 | 24 | 443 | 19 | 9 | 61 | 146 | |
| Sub-total Q3 | 5,422 | 3,337 | 2,085 | 180 | 224 | 238 | 5,185 | 237 | 85 | 660 | 2,065 | |
| Total | 16,660 | 10,089 | 6,571 | 592 | 725 | 825 | 15,841 | 819 | 329 | 2,138 | 6,517 | |
aCNVRs were annotated as described in Methods. Distribution of CNVs is summarized by CN class within each subpopulation and by functional class including: Total number = all CNVRs detected; nongenic = CNVRs that do not overlap coding regions; genic = CNVRs that overlap genes; miRNA = CNVRs that overlap microRNAs; Mendelian disease genes = CNVRs that include at least 1 known Mendelian disease gene; DGV = CNVRs that overlap a known CNV from the database of genomic variants; novel = CNVRs that do not overlap known CNVs and are unique to Qataris; novel genic = the subset of novel CNVRs that overlap at least 1 gene; promoter site = CNVRs that overlap promoter elements; transcription factor site = CNVRs that overlap at least 1 transcription factor site. Total for each subpopulation is a sum of deletions and duplications in each subpopulation
Top 15 KEGG Pathways Enriched in Genes Affected by CNVs in Qatarisa
| Kegg pathway | Number of genes | Fold-enrichment | p value |
|---|---|---|---|
| Notch signaling pathway | 16 | 2.7 | 4.4 × 10−4 |
| Starch and sucrose metabolism | 14 | 2.6 | 1.5 × 10−3 |
| Focal adhesion | 39 | 1.5 | 6.4 × 10−3 |
| mTOR signaling pathway | 14 | 2.1 | 1.1 × 10−2 |
| Purine metabolism | 30 | 1.5 | 1.6 × 10−2 |
| Antigen processing and presentation | 19 | 1.8 | 1.4 × 10−2 |
| Axon guidance | 25 | 1.5 | 3.2 × 10−2 |
| Type II diabetes mellitus | 12 | 2 | 3.0 × 10−2 |
| Drug metabolism | 14 | 1.8 | 4.4 × 10−2 |
| Extracellular matrix - receptor interaction | 17 | 1.6 | 6.0 × 10−2 |
| Type I diabetes mellitus | 10 | 1.9 | 7.6 × 10−2 |
| Non-small cell lung cancer | 12 | 1.8 | 7.2 × 10−2 |
| Insulin signaling pathway | 24 | 1.4 | 8.2 × 10−2 |
| Metabolism of xenobiotics by cytochrome P450 | 13 | 1.7 | 7.0 × 10−2 |
| Maturity onset diabetes of the young | 7 | 2.2 | 8.7 × 10−2 |
aAll genes affected by CNVs in Qataris were analyzed by DAVID bioinformatics resources for KEGG pathways. Number of genes in each enriched KEGG Pathway, along with the fold-enrichment within each pathway and a p value for the significance of enrichment
Qatari Genetic Subpopulation-specific Distribution of Known CNV Regions Deletions Affecting Known Mendelian Disease Genesa
| Disease (MIM number) | Gene | Exons affected | CHR | Start | End | Size | Q12 | Q22 | Q32 |
|---|---|---|---|---|---|---|---|---|---|
| Deletion | |||||||||
| Age related macular degeneration (603075) | HMCN1 | 31/107 | 1q31.1 | 185979151 | 185985000 | 5849 | - | <1 % | - |
| Chediak-Higashi syndrome (214500) | LYST | 47/53 | 1q42.3 | 235854998 | 235858929 | 3931 | <1 % | <1 % | - |
| Dystonia 16 (612067) | PRKRA | 6-7/7 | 2q31.2 | 179296981 | 179300871 | 3890 | >10 % | 1-10 % | 1-10 % |
| Glutaric acidemia IIC (231680) | ETFDH | 1/13 | 4q32.1 | 159591175 | 159594157 | 2982 | <1 % | 1-10 % | - |
| Distal myopathy (606070) | MATR3 | 16-17/18 | 5q31.2 | 138661971 | 138665031 | 3060 | 1-10 % | 1-10 % | <1 % |
| Prostate cancer (176807) | MSR1 | 5-10/10 | 8p22 | 15945301 | 16023600 | 78299 | - | <1 % | - |
| Alpha-methylacetoacetic aciduria (203750) | ACAT1 | 2-3/12 | 11q22.3 | 108002099 | 108004927 | 2828 | <1 % | - | <1 % |
| Keutel syndrome (245150) | MGP | 1-5/5 | 12p12.3 | 15035821 | 15051689 | 15868 | 1-10 % | - | - |
| von Willebrand disease (193400) | VWF | 4-5/52 | 12p13.31 | 6218203 | 6225614 | 7411 | 1-10 % | 1-10 % | - |
| Adams-Oliver syndrome (614219) | DOCK6 | 15-28/48 | 19p13.2 | 11332570 | 11350981 | 18411 | 1-10 % | 1-10 % | <1 % |
| Nephrotic syndrom (256300) | NPHS1 | 21-22/29 | 19q13.12 | 36328501 | 36331200 | 2699 | - | <1 % | - |
| Bleeding disorder, platelet-type (614201) | GP6 | 7-8/8 | 19q13.42 | 55523566 | 55526400 | 2834 | - | - | <1 % |
| Essential hypertension (14550) | PTGIS | 9-10/10 | 20q13.13 | 48124290 | 48128451 | 4161 | <1 % | - | - |
| Duplication | |||||||||
| Corneal dystrophy (136800) | COL8A2 | 2/2 | 1p34.3 | 36559621 | 36565584 | 5963 | - | - | <1 % |
| Cerebellar ataxia (614756) | CAMTA1 | 11/23 | 1p36.23 | 7735380 | 7742501 | 7121 | 1-10 % | >10 % | - |
| Peroxisome biogenesis disorder (614870) | PEX10 | 1-6/6 | 1p36.33,p36.32 | 2283844 | 2539006 | 255162 | >10 % | - | - |
| Holoprosencephaly-9 (610829) | GLI2 | 10-13/13 | 2q14.2 | 121739875 | 121747372 | 7497 | 1-10 % | - | <1 % |
| N-acetylaspartate deficiency (614063) | NAT8L | 1-3/3 | 4p16.3 | 2035597 | 2071655 | 36058 | >10 % | >10 % | 1-10 % |
| Primary ciliary dyskinesia 3 (608644) | DNAH5 | 48-50/79 | 5p15.2 | 13791701 | 13795151 | 3450 | 1-10 % | - | - |
| Bone marrow failure (614742) | TERT | 4-15/15 | 5p15.33 | 1230427 | 1255520 | 25093 | 1-10 % | - | - |
| Recessive spastic paraplegia (613647) | AP5Z1 | 1-17/17 | 7p22.1 | 4805669 | 4877956 | 72287 | >10 % | >10 % | >10 % |
| Progressive myoclonic epilepsy (611726) | KCTD7 | 1-5/5 | 7q11.21 | 66071436 | 66132291 | 60855 | - | - | <1 % |
| 5-oxoprolinase deficiency (260005) | OPLAH | 1-28/28 | 8q24.3 | 144773296 | 145216604 | 443308 | >10 % | >10 % | 1-10 % |
| Amelogenesis imperfecta, type 3 (130900) | FAM83H | 1-5/5 | 8q24.3 | 144773296 | 145216604 | 443308 | >10 % | >10 % | 1-10 % |
| Muscular dystrophy with epidermolysis bullosa (226670) | PLEC | 1-32/32 | 8q24.3 | 144773296 | 145216604 | 443308 | >10 % | >10 % | 1-10 % |
| Acrodermatitis enteropathica (201100) | SLC39A4 | 1-12/12 | 8q24.3 | 145278809 | 145771012 | 492203 | >10 % | - | - |
| Rothmund-Thomson syndrome (268400) | RECQL4 | 1-22/22 | 8q24.3 | 145278809 | 145771012 | 492203 | >10 % | - | - |
| Myasthenic syndrome (608931) | MUSK | 2-3/13 | 9q31.3 | 113439201 | 113451401 | 12200 | <1 % | <1 % | - |
| Autosomal dominant mental retardation (614254) | GRIN1 | 1-21/21 | 9q34.3 | 139887971 | 140232124 | 344153 | >10 % | >10 % | >10 % |
| Hypophosphatemic rickets with hypercalciuria (241530) | SLC34A3 | 1-13/13 | 9q34.3 | 139887971 | 140232124 | 344153 | >10 % | >10 % | >10 % |
| Recessive deafness (613307) | TPRN | 1-4/4 | 9q34.3 | 139887971 | 140232124 | 344153 | >10 % | >10 % | >10 % |
| Recessive mental retardation (614202) | MAN1B1 | 1-14/14 | 9q34.3 | 139887971 | 140232124 | 344153 | >10 % | >10 % | >10 % |
| Osteogenesis imperfecta, type V (610967) | IFITM5 | 1-2/2 | 11p15.5 | 280817 | 312896 | 32079 | - | - | 1-10 % |
| Famililial hyperproinsulinemia (MODY) (613370) | INS | 1-2/2 | 11p15.5 | 2179313 | 2194175 | 14862 | 1-10 % | 1-10 % | 1-10 % |
| Segawa syndrome (605407) | TH | 1-14/14 | 11p15.5 | 2179313 | 2194175 | 14862 | 1-10 % | 1-10 % | 1-10 % |
| Primary congenital glaucoma (613086) | LTBP3 | 1-10/10 | 11q13.1 | 65305964 | 65407963 | 101999 | >10 % | - | - |
| Pyruvate carboxylase deficiency (266150) | PC | 13-18/22 | 11q13.2 | 66617727 | 66629986 | 12259 | 1-10 % | - | - |
| Mitochondrial myopathy and sideroblastic anemia (600462) | PUS1 | 1-4/6 | 12q24.33 | 132369172 | 132424944 | 55772 | >10 % | - | - |
| GABA-transaminase deficiency (613163) | ABAT | 1-16/16 | 16p13.2 | 8723887 | 8875529 | 151642 | - | - | <1 % |
| Progressive myopathy with developmental delay (613076) | GFER | 1-3/3 | 16p13.3 | 2003399 | 2285357 | 281958 | >10 % | 1-10 % | - |
| Polycystic kidney disease, adult type I (173900) | PKD1 | 1-46/46 | 16p13.3 | 2003399 | 2285357 | 281958 | >10 % | 1-10 % | - |
| Tuberous sclerosis 2 (606690) | TSC2 | 1-23/23 | 16p13.3 | 2003399 | 2285357 | 281958 | >10 % | 1-10 % | - |
| Tyrosinemia, type II (276600) | TAT | 1-12/12 | 16q22.2 | 71541001 | 71622751 | 81750 | - | - | <1 % |
| Cataract (610202) | MAF | 1-2/2 | 16q23.2 | 79620742 | 79638078 | 17336 | 1-10 % | 1-10 % | 1-10 % |
| Huntington disease-like 2 (606438) | JPH3 | 2/5 | 16q24.2 | 87720933 | 87724383 | 3450 | 1-10 % | <1 % | <1 % |
| Knobloch syndrome (267750) | COL18A1 | 1-41/41 | 21q22.3 | 46853110 | 46974756 | 121646 | - | 1-10 % | 1-10 % |
| Bethlem myopathy (158810) | COL6A1 | 3-35/35 | 21q22.3 | 47390167 | 47435702 | 45535 | >10 | 1-10 | 1-10 % |
| Recessive familial candidiasis (613953) | IL17RA | 1/1 | 22q11.1 | 17595746 | 17616510 | 20764 | 1-10 % | - | 1-10 % |
aGenes affected by CNVRs in each subpopulation were looked up in the database for Online Mendelian Inheritance in Man (OMIM) for confirmed role in disease. Disease name, MIM number (OMIM identifier) and gene appear in the first two columns, followed by CNVR-centric information and sub-population-centric data. Start-End: coordinates of CNV containing OMIM gene; Deleted/Duplicated exons: exons from each gene within the boundaries of the deletion or duplication
2Q1, Q2 and Q3: Percentage of individuals in each subpopulation carrying this CNVR. “-” indicates CNVR not present in this subpopulation
Novel Qatari-specific CNVRs Affecting OMIM Disease Genesa
| OMIM disorder | MIM number | OMIM gene | OMIM gene name | Exons affected | Other affected genes | Type | ChrCytoband: start-end | Size (bp) | Q1 ( | Q2 ( | Q3 ( |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Age-related macular degeneration | 603075 | HMCN1 | Hemonectin | 31/107 | - | Deletion | 1q31.1:185979151-185985000 | 5849 | 0 | 1 | 0 |
| Chediak-Higashi syndrome | 214500 | LYST | Lysosomal trafficking regulator | 47/53 | - | Deletion | 1q42.3:235854998-235858929 | 3931 | 1 | 1 | 0 |
| Glutaric acidemia IIC | 231680 | ETFDH | Electron transfer flavoprotein dehydrogenase | 1/13 | C4orf46 | Deletion | 4q32.1:159591175-159594157 | 2982 | 1 | 1 | 0 |
| Hereditary nonpolyposis colorectal cancer, type 4 | 614337 | PMS2 | Post-meiotic segregation increased, S. cerevisiae 2 | 13-14/15 | - | Deletion | 7p22.1:6016951-6019650 | 2699 | 2 | 1 | 0 |
| Microcephaly 1, autosomal recessive | 251200 | MCPH1 | Microcephaly 1 | 14/14 | - | Deletion | 8p23.1:6493670-6501582 | 7912 | 1 | 0 | 0 |
| Deafness, autosomal dominant | 608641 | GRHL2 | Grainy-head like 2 | 8/16 | - | Deletion | 8q22.3:102604016-102619491 | 15475 | 0 | 1 | 0 |
| Alpha-methylacetoacetic aciduria | 203750 | ACAT1 | Acetyl-CoA acetyltransferase 1 | 2-3/12 | - | Deletion | 11q22.3:108002099-108004927 | 2828 | 1 | 0 | 1 |
| Gitelman syndrome | 263800 | SLC12A3 | Solute carrier family 12, member 3 | 1-7/26 | NUP9, miR-138-2 | Deletion | 16q13:56857680-56905458 | 47778 | 1 | 2 | 0 |
| Essential hypertension | 145500 | PTGIS | Prostaglandin I2 synthase | 9-10/10 | - | Deletion | 20q13.13:48124290-48128451 | 4161 | 1 | 0 | 0 |
| Saethre-Chotzen syndrome; craniosynostosis, type 1; Robinow-Sorauf syndrome | 101400; 123100; 180750 | TWIST1 | Twist basic helix-loop-helix transcription factor 1 | 1-2/2 | miR-137, miR-25/32/92/92ab/363/367, miR-33/33ab, miR-543 | Full duplication | 7p21.1:19149966-19157073 | 7107 | 1 | 0 | 0 |
| Tyrosinemia, type II | 276600 | TAT | Tyrosine aminotransferase | 1-12/12 | CHST4, miR-485, miR-202, miR-125/351 | Full duplication | 16q22.2:71541001-71622751 | 81750 | 0 | 0 | 1 |
| Holoprosencephaly-9 | 610829 | GLI2 | Gli-kruppel family member 2 | 10-13/13 | - | Internal duplication | 2q14.2:121739875-121747372 | 7497 | 2 | 0 | 1 |
| Hereditary nonpolyposis colorectal cancer, type 4 | 614337 | PMS2 | Post-meiotic segregation increased, S. cerevisiae 2 | 13-14/15 | - | Internal duplication | 7p22.1:6016501-6019650 | 3149 | 5 | 1 | 1 |
| Congenital myasthenic syndrome | 608931 | MUSK | Muscle, skeletal receptor tyrosine kinase | 2-3/14 | - | Internal duplication | 9q31.3:113439201-113451401 | 12200 | 1 | 1 | 0 |
aQatari CNVRs were compared to CNVRs from the 1000 Genomes Phase I study (n = 1092) [25] that were generated using next-generation sequencing technologies. Only 14 CNVRs were novel, including 9 deletions and 5 duplications. OMIM disorder – name of disorder as it appears in the OMIM database. MIM number – OMIM identifier. Del/Dup – Whether CNVR is a deletion or duplication (full or internal). Other affected genes – Other genes (not in OMIM) within the same CNV. ChrCytoband:start-end – Genomic location of the CNVR in Qataris. Size – Size of CNVR. Q1, Q2, Q3 – Qatari subpopulation (n denotes number of individuals in each subpopulation)
Top 10 Cytobands in Which Qatari Genetic Subpopulations’ CNVRs were Observed at a Significantly Higher Frequency than in 1000 Genomes Phase I CNV Dataa
| Qatari genetic subpopulation | Cytoband | 1000Genomes phase I count | Q1, Q2 or Q3 count |
| Diseases associated with cytoband |
|---|---|---|---|---|---|
| Q1 | 1q21.1 | 10 | 55 | 1.60 × 10−16 | Schizophrenia; congenital heart disease |
| 9q21.11 | 5 | 42 | 1.06 × 10−14 | - | |
| 5q13.2 | 6 | 35 | 3.43 × 10−11 | Neurologic disorders; alcohol dependence | |
| 9p11.2 | 17 | 49 | 8.97 × 10−11 | - | |
| 16p11.2 | 23 | 50 | 7.10 × 10−9 | Autism; schizophrenia; obesity | |
| 10q11.22 | 7 | 29 | 2.45 × 10−8 | - | |
| 9p12 | 14 | 38 | 2.66 × 10−8 | - | |
| 8p11.1 | 2 | 20 | 7.14 × 10−8 | - | |
| 9p13.1 | 8 | 29 | 7.55 × 10−8 | - | |
| 7q11.1 | 3 | 20 | 3.66 × 10−7 | - | |
| Q2 | 1q21.1 | 10 | 59 | 9.67 × 10−22 | Schizophrenia; congenital heart disease |
| 5q13.2 | 6 | 36 | 8.21 × 10−14 | Neurologic disorders; alcohol dependence | |
| 9q21.11 | 5 | 33 | 4.22 × 10−13 | - | |
| 16p11.2 | 23 | 51 | 8.07 × 10−12 | Autism; schizophrenia; obesity | |
| 9p11.2 | 17 | 44 | 1.17 × 10−11 | - | |
| 7q11.1 | 3 | 25 | 8.91 × 10−11 | - | |
| 8p11.1 | 2 | 20 | 3.77 × 10−9 | - | |
| 7q11.21 | 35 | 53 | 5.43 × 10−9 | - | |
| 10q11.22 | 7 | 26 | 8.97 × 10−9 | - | |
| 9p12 | 14 | 30 | 1.76 × 10−7 | - | |
| Q3 | 1q21.1 | 10 | 60 | 2.04 × 10−20 | Schizophrenia; congenital heart disease |
| 9q21.11 | 5 | 37 | 7.23 × 10−14 | - | |
| 9p11.2 | 17 | 48 | 6.27 × 10−12 | - | |
| 16p11.2 | 23 | 54 | 1.26 × 10−11 | Autism; schizophrenia; obesity | |
| 5q13.2 | 6 | 33 | 1.83 × 10−11 | Neurologic disorders; alcohol dependence | |
| 10q11.22 | 7 | 29 | 3.10 × 10−9 | - | |
| 7q11.1 | 3 | 21 | 3.00 × 10−8 | - | |
| 22q11.1 | 7 | 25 | 1.20 × 10−7 | - | |
| 8p11.1 | 2 | 18 | 1.24 × 10−7 | - | |
| 1p36.33 | 12 | 30 | 2.87 × 10−7 | Disorders of sexual development; obesity |
aAll CNVRs detected in Q1, Q2 and Q3 Qataris as well as all CNVRs in the Database of Genomic Variantsfrom the 1000 Genomes Project Phase I [25] data were annotated for which chromosome and cytoband they affected. Fisher’s exact test was used to evaluate enrichment or depletion of CNVRs from a specific cytoband, corrected p value for significance (<6.7 × 10−6)
Fig. 3All SNPs within 500 kb of start and end breakpoints of 1,193 deletions were used to detect for each deletion a SNP with the maximum pairwise LD correlation. This was done both for a. all 1193 CNVs and b. for only 422 Genic CNVs. In both cases, the WGS SNVs significantly outperformed the OMNI2.5 M SNPs, especially at higher r2 values. WGS-SNVs: Whole genome sequencing detected variants (●). OMNI2.5 M-SNPs: SNPs present on the OMNI2.5 M array (Ο)