| Literature DB >> 36075934 |
Torsten Schmenger1,2, Gaurav D Diwan1,2, Gurdeep Singh1,2, Gordana Apic1,2, Robert B Russell3,4.
Abstract
The rapid pace with which genetic variants are now being determined means there is a pressing need to understand how they affect biological systems. Variants from healthy individuals have previously been used to study blood groups or HLA diversity and to identify genes that can apparently be nonfunctional in healthy people. These studies and others have observed a lower than expected frequency of homozygous individuals for potentially deleterious alleles, which would suggest that several of these alleles can lead to recessive disorders. Here we exploited this principle to hunt for potential disease variants in genomes from healthy people. We identified at least 108 exclusively heterozygous variants with evidence for an impact on biological function. We discuss several examples of candidate variants/genes including CCDC8, PANK3, RHD and NLRP12. Overall, the results suggest there are many, comparatively frequent, potentially lethal or disease-causing variants lurking in healthy human populations.Entities:
Year: 2022 PMID: 36075934 PMCID: PMC9458638 DOI: 10.1038/s41525-022-00322-z
Source DB: PubMed Journal: NPJ Genom Med ISSN: 2056-7944 Impact factor: 6.083
Fig. 1Exclusively heterozygous variants in 1 kG.
a Plots of homozygous vs heterozygous counts for the 1 kG dataset. The preponderance of values on the X axis (i.e. zero homozygous counts) are indicated. b As in a) but with shuffled 1 kG data. c How the distribution of functional impact scores changes as homozygous counts decrease. d How the distribution of functional impact scores changes for sites where homozygous counts are zero with increasing heterozygous counts.
Fig. 2Filtering and data processing overview.
Overview showing the processing and filtering of 201k missense variants based on exclusive heterozygosity.
Fig. 3Examples of exclusively heterozygous variants.
Examples of exclusively heterozygous variants showing hints of a possible structural/functional consequence. a Top: Jalview[78] alignment of selected mammalian orthologs around Gln200 (Arrow) in CCDC8. Conserved residues are shown in ClustalX colours. Bottom: domain diagram superimposed on top of a IUPred plot of protein disorder[15]. Locations of phosphorylated serines and other mutations associated with disease are labelled in addition to p.Gln200Leu. b Left: as for a). Right: VMD[79] representation of the Alphafold2[6] NLRP12 model showing the location of Asn394. The zoomed view highlights (ball-and-stick representation) sidechains (Tyr390) or mainchain (Ser427 Arg429) atoms in contact with Asn394 (spheres). c Left: alignment as for a) but with PANK1-3 paralogs from Uniprot Sprot. Right: VMD representation of location of PANK3 Ile301 on the crystal structure (RCSB PDB:6pe6). The zoomed image shows how PANK3 (cyan spheres) packs tightly against hydrophobic sidechains (brown). d Left: alignment as for c). Right: VMD representation of a superimposition[80] of the Alphafold2 structure of RHD superimposed with two copies of RHCG (using RCSB PDB:3hd6). The location of Tyr311 is shown (cyan/red spheres) as are the Ca atoms of residues harbouring weak D mutations (magenta).