| Literature DB >> 27408750 |
Khalid A Fakhro1, Michelle R Staudt2, Monica Denise Ramstetter2, Amal Robay3, Joel A Malek3, Ramin Badii4, Ajayeb Al-Nabet Al-Marri4, Charbel Abi Khalil3, Alya Al-Shakaki3, Omar Chidiac3, Dora Stadler5, Mahmoud Zirie6, Amin Jayyousi6, Jacqueline Salit2, Jason G Mezey7, Ronald G Crystal2, Juan L Rodriguez-Flores2.
Abstract
Reaching the full potential of precision medicine depends on the quality of personalized genome interpretation. In order to facilitate precision medicine in regions of the Middle East and North Africa (MENA), a population-specific genome for the indigenous Arab population of Qatar (QTRG) was constructed by incorporating allele frequency data from sequencing of 1,161 Qataris, representing 0.4% of the population. A total of 20.9 million single nucleotide polymorphisms (SNPs) and 3.1 million indels were observed in Qatar, including an average of 1.79% novel variants per individual genome. Replacement of the GRCh37 standard reference with QTRG in a best practices genome analysis workflow resulted in an average of 7* deeper coverage depth (an improvement of 23%) and 756,671 fewer variants on average, a reduction of 16% that is attributed to common Qatari alleles being present in QTRG. The benefit for using QTRG varies across ancestries, a factor that should be taken into consideration when selecting an appropriate reference for analysis.Entities:
Year: 2016 PMID: 27408750 PMCID: PMC4927697 DOI: 10.1038/hgv.2016.16
Source DB: PubMed Journal: Hum Genome Var ISSN: 2054-345X
Individual variant discovery in 1,005 unrelated Qatari[1]
| Variant sites | 4,045,064 | 3,893,076 | 151,017 | 937 | 34 | 15,382 | 15,069 | 312 | 1 | 13,839 | 13,538 | 300 | 1 |
| Novel variant sites | 96,774 | 77,366 | 19,359 | 42 | 7 | 159 | 152 | 7 | 0 | 96 | 91 | 5 | 0 |
| Novel variant rate | 2.39% | 1.99% | 12.82% | 4.48% | 20.59% | 1.03% | 0.99% | 2.24% | 0.00% | 0.69% | 0.67% | 1.67% | 0.00% |
| Alternate alleles | 5,510,301 | 5,311,794 | 196,565 | 1,874 | 68 | 21,046 | 20,613 | 431 | 2 | 19,116 | 18,709 | 405 | 2 |
| Novel alternate alleles | 98,792 | 78,916 | 19,778 | 84 | 14 | 160 | 153 | 7 | 0 | 97 | 92 | 5 | 0 |
| Novel allele rate | 1.79% | 1.49% | 10.06% | 4.48% | 20.59% | 0.76% | 0.74% | 1.62% | 0.00% | 0.50% | 0.49% | 1.23% | 0.00% |
| Heterozygous sites | 2,594,268 | 2,489,084 | 105,184 | — | — | 9,759 | 9,564 | 195 | — | 8,897 | 8,698 | 199 | — |
| Novel heterozygous sites | 94,762 | 75,772 | 18,990 | — | — | 157 | 150 | 7 | — | 94 | 89 | 5 | — |
| Novel heterozygous rate | 3.65% | 3.04% | 18.05% | — | — | 1.61% | 1.57% | 3.59% | — | 1.06% | 1.02% | 2.51% | — |
| Mean depth at variant site | 41 | 41 | 40 | 20 | 250 | 63 | 64 | 69 | 42 | 64 | 64 | 74 | 27 |
| Mean depth at novel variant site | 41 | 41 | 39 | 20 | 250 | 59 | 60 | 60 | 47 | 63 | 63 | 76 | — |
| Transition:transversion ratio | 2.03 | 2.03 | 1.78 | 1.51 | 33.00 | 3.18 | 3.18 | 2.77 | — | 3.25 | 3.26 | 2.67 | — |
| Novel transition:transversion ratio | 1.33 | 1.35 | 1.09 | 1.63 | — | 0.77 | 0.77 | 0.75 | — | 1.58 | 1.56 | 1.50 | — |
Shown is a summary of the average number of variants observed per individual, identified in 917 unrelated Qatari exomes and 88 unrelated Qatari genomes. Variants were genotyped separately for autosomes, X in males, X in females, Y in males and mtDNA; 99.8% of X variants in males were also observed in females, hence summary statistics are based on female chromosomes. Shown is the average per individual of number of variant sites, number of variant alleles, the transition-to-transversion ratio (Ts:Tv) of variants and the % not in dbSNP (novel).
Figure 1Differences in mapped read depth across reference genomes. In order to select the optimal reference for analysis of Qatari genomes and exomes, the mapped read depth was compared between GRCh37 and three alternative reference genomes based on MAAs observed in n=1005 Qatari. Illumina paired-end 100 bp reads for 37* genome sequencing of a female Qatari were mapped using BWA to GRCh37, QTRG1, QTRG2 and QTRG3 reference genomes. The differences between the three Qatari references is that QTRG1 incorporates MAA SNPs, QTRG2 incorporates MAA indels, and QTRG3 incorporates both MAA SNPs and MAA indels. The depth of coverage was measured at (a) across the genome and (b) at MAA sites modified in the QTRG. MAAs, major alternate alleles; SNP, single nucleotide polymorphism.
Variants in Qatar, stratified by allele frequency and potential for pathogenicity[1]
| n | n | n | n | n | n | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| All SNPs | 20,864,277 | 100.00 | 12,948,368 | 62.06 | 5,938,490 | 28.46 | 1,693,649 | 8.12 | 195,466 | 0.94 | 88,303 | 0.42 |
| 3–Potentially pathogenic | 155,571 | 0.75 | 124,947 | 0.60 | 23,282 | 0.11 | 5,957 | 0.03 | 956 | <0.01 | 428 | <0.01 |
| 2–In gene linked to phenotype | 50,757 | 0.24 | 40,894 | 0.20 | 7,445 | 0.04 | 2,002 | 0.01 | 290 | <0.01 | 125 | <0.01 |
| 1–Variant with known link | 2,152 | 0.01 | 999 | <0.01 | 876 | <0.01 | 253 | <0.01 | 21 | <0.01 | 2 | <0.01 |
The major allele variants are modified in the QTRG genome, such that all reported variants are the minor allele. A total of 230,395 potentially deleterious SNPs in the 917 exomes and 88 genomes were computationally categorized with respect to allele frequency and databases of genes and variants with reported links to a phenotype. Variants were assigned to genes and their function was predicted with respect to ENSEMBL[34] gene models using SNPEFF,[32] and potentially deleterious coding SNPs (nonsynonymous, splice donor site, splice acceptor site, stop gain, start loss) variants were extracted for further analysis. A database that combines OMIM,[35] HGMD,[36] GWAS,[37] PharmGKB,[38] Human Phenotype Ontology[39] and ClinVar[40] was compiled, where these annotations were used to divide the potentially deleterious variants into three categories, variant and gene linked to a phenotype (Category 1), gene but not variant linked to a phenotype (Category 2) and neither variant nor gene linked to a phenotype (Category 3). The totals for each category are shown in the left-most columns, including number and percentage. These variants were then sub-classified into two major (major reference allele, major alternate allele) and five minor categories based on variant allele frequency in Qatar rare alternate allele (up to 5% variant allele frequency), common alternate allele (between 5 and 50% allele frequency), common reference allele (from 50 to 95% alternate allele frequency), rare reference allele (from 95 to 100% alternate allele frequency), unobserved reference allele (100% allele frequency). The major alternate alleles (MAA) are modified in the Qatar Genome (QTRG).
Abbreviation: SNP, single nucleotide polymorphism.