| Literature DB >> 33828584 |
Dustin B Miller1, Stephen R Piccolo1.
Abstract
Compound heterozygous (CH) variants occur when two recessive alleles are inherited and the variants are located at different loci within the same gene in a given individual. CH variants are important contributors to many different types of recessively inherited diseases. However, many studies overlook CH variants because identification of this type of variant requires knowing the parent of origin for each nucleotide. Using computational methods, haplotypes can be inferred using a process called "phasing," which estimates the chromosomal origin of most nucleotides. In this paper, we used germline, phased, whole-genome sequencing (WGS) data to identify CH variants across seven pediatric diseases (adolescent idiopathic scoliosis: n = 16, congenital heart defects: n = 709, disorders of sex development: n = 79, ewing sarcoma: n = 287, neuroblastoma: n = 259, orofacial cleft: n = 107, and syndromic cranial dysinnervation: n = 172), available as parent-child trios in the Gabriella Miller Kids First Data Resource Center. Relatively little is understood about the genetic underpinnings of these diseases. We classified CH variants as "potentially damaging" based on minor allele frequencies (MAF), Combined Annotation Dependent Depletion scores, variant impact on transcription or translation, and gene-level frequencies in the disease group compared to a healthy population. For comparison, we also identified homozygous alternate (HA) variants, which affect both gene copies at a single locus; HA variants represent an alternative mechanism of recessive disease development and do not require phasing. Across all diseases, 2.6% of the samples had a potentially damaging CH variant and 16.2% had a potentially damaging HA variant. Of these samples with potentially damaging variants, the average number of genes per sample was 1 with a CH variant and 1.25 with a HA variant. Across all samples, 5.1 genes per disease had a CH variant, while 35.6 genes per disease had a HA variant; on average, only 4.3% of these variants affected common genes. Therefore, when seeking to identify potentially damaging variants of a putatively recessive disease, CH variants should be considered as potential contributors to disease development. If CH variants are excluded from analysis, important candidate genes may be overlooked.Entities:
Keywords: compound heterozygous variants; genetic analysis of complex diseases; germline variants; pediatric cancer; structural birth defect; trios
Year: 2021 PMID: 33828584 PMCID: PMC8019969 DOI: 10.3389/fgene.2021.640242
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Flow diagram of the gVCF processing steps. These steps were taken prior to HA and CH variant identification.
FIGURE 2The median number of variants per sample, across all disease types, after each processing step where variants were excluded. The original gVCF files had a median of 5,509,545 unphased variants across all samples and diseases. Approximately 70.7% of the variants were available for CH and HA identification after processing the original gVCF files. Of the available variants, ∼86.7% were phased on average.
FIGURE 3Percentage of 1000GP samples with a CH variant in a gene. Frequency represents the number of genes that were observed at a specific percentage.
The number of genes and samples with potentially damaging CH variants before and after filtering with the 1000GP data.
| Adolescent idiopathic scoliosis | 3 | 1 | 2 (12.5%) | 1 (6.3%) |
| Congenital heart defects | 29 | 15 | 63 (8.9%) | 18 (2.5%) |
| Disorders of sex development | 6 | 2 | 9 (11.4%) | 2 (2.5%) |
| Ewing sarcoma | 12 | 6 | 24 (8.4%) | 7 (2.4%) |
| Neuroblastoma | 14 | 5 | 26 (10%) | 5 (1.9%) |
| Orofacial cleft | 4 | 3 | 7 (6.5%) | 4 (3.7%) |
| Syndromic cranial dysinnervation | 8 | 4 | 15 (8.7%) | 6 (3.5%) |
The number of genes and samples with potentially damaging HA variants before and after filtering with the 1000GP data.
| Adolescent idiopathic scoliosis | 7 | 5 | 4 (25%) | 4 (25%) |
| Congenital heart defects | 134 | 102 | 193 (27.2%) | 129 (18.2%) |
| Disorders of sex development | 33 | 24 | 20 (25.3%) | 18 (22.8%) |
| Ewing sarcoma | 47 | 39 | 54 (18.8%) | 37 (12.9%) |
| Neuroblastoma | 43 | 35 | 53 (20.5%) | 36 (13.9%) |
| Orofacial cleft | 19 | 16 | 19 (17.8%) | 13 (12.1%) |
| Syndromic cranial dysinnervation | 32 | 28 | 36 (20.9%) | 27 (15.7%) |
The number of samples with potentially damaging HA variants and the number of unique samples (not seen with potentially damaging HA variants) with potentially damaging CH variants after filtering with 1000GP data.
| Adolescent idiopathic scoliosis | 16 | 4 | 1 | 25 |
| Congenital heart defects | 709 | 129 | 11 | 8.5 |
| Disorders of sex development | 79 | 18 | 2 | 11.1 |
| Ewing sarcoma | 287 | 37 | 7 | 18.9 |
| Neuroblastoma | 259 | 36 | 5 | 13.9 |
| Orofacial cleft | 105 | 13 | 3 | 23.1 |
| Syndromic cranial dysinnervation | 172 | 27 | 6 | 22.2 |
The number of genes with potentially damaging HA variants and the number of unique genes (potentially damaging HA variants not identified in gene) with potentially damaging CH variants after filtering with 1000GP data.
| Adolescent idiopathic scoliosis | 5 | 1 | 20 |
| Congenital heart defects | 102 | 11 | 10.8 |
| Disorders of sex development | 24 | 2 | 8.3 |
| Ewing sarcoma | 39 | 5 | 12.8 |
| Neuroblastoma | 35 | 4 | 11.4 |
| Orofacial cleft | 16 | 3 | 18.8 |
| Syndromic cranial dysinnervation | 28 | 3 | 10.7 |
FIGURE 4Number of samples with potentially damaging CH or HA variants in genes involved in developmental biology. No potentially damaging CH or HA variants were identified in tumor suppressor genes.
FIGURE 5The landscape of potentially damaging CH variants. Each colored-box represents a gene type (see legend). The values within each colored-box indicate how many samples for that disease had a CH variant in that gene.
Pathways enriched with genes containing potentially damaging CH variants.
| Ewing sarcoma | ABO blood group biosynthesis ( | 0.014 | 0.006 |
| Lewis blood group biosynthesis ( | 0.028 | 0.013 | |
| Blood group systems biosynthesis ( | 0.028 | 0.013 | |
| Inactivation, recovery and regulation of the phototransduction cascade ( | 0.028 | 0.013 | |
| The phototransduction cascade ( | 0.028 | 0.013 | |
| Activation of Matrix Metalloproteinases ( | 0.028 | 0.013 | |
| Collagen degradation ( | 0.041 | 0.019 | |
| Neuroblastoma | Inactivation, recovery and regulation of the phototransduction cascade ( | 0.018 | 0.008 |
| The phototransduction cascade ( | 0.018 | 0.008 | |
| Visual phototransduction ( | 0.038 | 0.018 | |
| NF-kB activation through FADD/RIP-1 pathway mediated by caspase-8 and -10 ( | 0.018 | 0.008 | |
| TRAF3-dependent IRF activation pathway ( | 0.018 | 0.008 | |
| TRAF6 mediated NF-kB activation ( | 0.018 | 0.008 | |
| TRAF6 mediated IRF7 activation ( | 0.018 | 0.008 | |
| Negative regulators of DDX58/IFIH1 signaling ( | 0.018 | 0.008 | |
| Ovarian tumor domain proteases ( | 0.018 | 0.008 | |
| DDX58/IFIH1-mediated induction of interferon-alpha/beta ( | 0.032 | 0.015 | |
| Orofacial cleft | Inactivation, recovery and regulation of the phototransduction cascade ( | 0.012 | 0.005 |
| The phototransduction cascade ( | 0.012 | 0.005 | |
| Visual phototransduction ( | 0.023 | 0.010 |