| Literature DB >> 30566479 |
Wazim Mohammed Ismail1, Kymberleigh A Pagel1, Vikas Pejaver1, Simo V Zhang1, Sofia Casasa2, Matthew Mort3, David N Cooper3, Matthew W Hahn1,2, Predrag Radivojac4.
Abstract
Recent genetic studies and whole-genome sequencing projects have greatly improved our understanding of human variation and clinically actionable genetic information. Smaller ethnic populations, however, remain underrepresented in both individual and large-scale sequencing efforts and hence present an opportunity to discover new variants of biomedical and demographic significance. This report describes the sequencing and analysis of a genome obtained from an individual of Serbian origin, introducing tens of thousands of previously unknown variants to the currently available pool. Ancestry analysis places this individual in close proximity to Central and Eastern European populations; i.e., closest to Croatian, Bulgarian and Hungarian individuals and, in terms of other Europeans, furthest from Ashkenazi Jewish, Spanish, Sicilian and Baltic individuals. Our analysis confirmed gene flow between Neanderthal and ancestral pan-European populations, with similar contributions to the Serbian genome as those observed in other European groups. Finally, to assess the burden of potentially disease-causing/clinically relevant variation in the sequenced genome, we utilized manually curated genotype-phenotype association databases and variant-effect predictors. We identified several variants that have previously been associated with severe early-onset disease that is not evident in the proband, as well as putatively impactful variants that could yet prove to be clinically relevant to the proband over the next decades. The presence of numerous private and low-frequency variants, along with the observed and predicted disease-causing mutations in this genome, exemplify some of the global challenges of genome interpretation, especially in the context of under-studied ethnic groups.Entities:
Mesh:
Year: 2018 PMID: 30566479 PMCID: PMC6300249 DOI: 10.1371/journal.pone.0208901
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Venn diagrams showing the total numbers of identified variants using two read mappers (BWA [29], Bowtie2 [30]) and two variant callers (GATK [31], Platypus [32]).
Summary of identified variants using BWA+ GATK.
Variants not present in gnomAD [66] are listed as novel and variants identified by all four genotyping platforms are listed as confident.
| Type of Variant | Variant | Novel | Confident variants | Confident novel |
|---|---|---|---|---|
| upstream | 23094 | 320 | 16211 | 90 |
| upstream; downstream | 881 | 8 | 624 | 4 |
| UTR5 | 5205 | 54 | 4055 | 22 |
| UTR5; UTR3 | 16 | 1 | 12 | 0 |
| exonic | 20706 | 145 | 17114 | 115 |
| exonic; splicing | 33 | 1 | 22 | 0 |
| splicing | 151 | 0 | 107 | 0 |
| intronic | 1410507 | 20531 | 1078226 | 4336 |
| UTR3 | 31066 | 409 | 24095 | 101 |
| downstream | 26685 | 398 | 19351 | 61 |
| ncRNA_exonic | 13064 | 129 | 9520 | 30 |
| ncRNA_exonic; splicing | 3 | 0 | 2 | 0 |
| ncRNA_intronic | 235936 | 3376 | 173168 | 832 |
| ncRNA_splicing | 65 | 1 | 51 | 0 |
| ncRNA_UTR5 | 1 | 1 | 0 | 0 |
| intergenic | 2164496 | 34779 | 1597484 | 6848 |
Summary of identified exonic variants using BWA+GATK.
Variants not present in gnomAD [66] are listed as novel and variants identified by all four platforms are listed as confident.
| Type of Variant | Variants | Novel | Confident variants | Confident novel |
|---|---|---|---|---|
| synonymous SNV | 10381 | 42 | 8965 | 36 |
| nonsynonymous SNV | 9328 | 80 | 7559 | 69 |
| nonframeshift deletion | 137 | 2 | 62 | 0 |
| nonframeshift insertion | 117 | 3 | 58 | 0 |
| frameshift deletion | 103 | 6 | 45 | 4 |
| frameshift insertion | 74 | 3 | 37 | 1 |
| stopgain | 87 | 6 | 54 | 4 |
| stoploss | 11 | 0 | 9 | 0 |
| unknown | 501 | 4 | 347 | 1 |
Fig 2Principal component analysis (PCA) plot showing the proximity of the genome sequenced in this study to other European genomes.
As observed in previous studies [2, 3], genomic distance correlates with geographic distance.
Testing gene flow with Neanderthals.
The results show the D-statistic (D), its standard error (SE) and Z-score (Z) for the test using the set of populations P1, P2, and P3, with Chimpanzee as an outgroup (O). The last two columns show ABBA vs. BABA counts over the four genomes (P1, P2, P3, O).
| SE | Z-score | ABBA | BABA | |||||
|---|---|---|---|---|---|---|---|---|
| Yoruba | Serbian | Altai | Chimpanzee | 0.0241 | 0.004476 | 5.393 | 18158 | 17302 |
| Yoruba | Croatian | Altai | Chimpanzee | 0.0233 | 0.003192 | 7.302 | 18268 | 17436 |
| Yoruba | French | Altai | Chimpanzee | 0.0266 | 0.003012 | 8.821 | 18284 | 17338 |
| Yoruba | Greek | Altai | Chimpanzee | 0.0270 | 0.003034 | 8.906 | 18266 | 17305 |
| Yoruba | Russian | Altai | Chimpanzee | 0.0288 | 0.003096 | 9.306 | 18328 | 17302 |
| Mbuti | Serbian | Altai | Chimpanzee | 0.0186 | 0.004763 | 3.909 | 18817 | 18129 |
| Mbuti | Croatian | Altai | Chimpanzee | 0.0178 | 0.003693 | 4.832 | 18891 | 18229 |
| Mbuti | French | Altai | Chimpanzee | 0.0210 | 0.003532 | 5.941 | 18902 | 18125 |
| Mbuti | Greek | Altai | Chimpanzee | 0.0214 | 0.003578 | 5.978 | 18897 | 18106 |
| Mbuti | Russian | Altai | Chimpanzee | 0.0232 | 0.003600 | 6.434 | 18932 | 18074 |
Amount of disease-causing and potentially disease-relevant variation in the Serbian genome.
Identified variants were searched against HGMD and broken down into the phenotypic categories of HGMD. Variants were broken down into exonic and noncoding as well as homozygous and heterozygous.
| Exome | Noncoding | |||
|---|---|---|---|---|
| Hom | Het | Hom | Het | |
| Disease-causing mutations (DM) | 1 | 9 | 4 | 6 |
| Likely disease-causing mutations (DM?) | 29 | 51 | 8 | 31 |
| Disease-associated polymorphisms with additional supporting functional evidence (DFP) | 78 | 139 | 203 | 301 |
| Disease-associated polymorphisms (DP) | 233 | 356 | 189 | 322 |
| Polymorphisms that affect gene/protein structure, function or expression but with no reported disease association (FP) | 63 | 95 | 95 | 130 |
The number of homozygous and heterozygous variants that are associated with variants reported in HGMD. HGMD labels correspond to the strength and/or evidence for the relationship between variant and disease.
Disease-causing variants observed in the proband.
The table summarizes the analysis of five homozygous variants form the sequenced genome that are listed by HGMD as disease-causing.
| Gene | Variant | rsID | Phenotype |
|---|---|---|---|
| NC_000001.10:g.98502934G>T | rs1625579 | Schizophrenia increased risk | |
| NM_000339.2:c.1670-8C>T | NA | Gitelman syndrome without hypomagnesaemia | |
| NM_207581.3:c.554+6C>T | NA | Hypothyroidism | |
| NM_000129.3:c.-19+12C>A | rs2815822 | Factor XIII deficiency | |
| NP_065109.1:p.P481L | rs1138693 | Myopathy late-onset |