Literature DB >> 35506430

How to proceed after "negative" exome: A review on genetic diagnostics, limitations, challenges, and emerging new multiomics techniques.

Saskia B Wortmann^1,2, Machteld M Oud^3,4, Mariëlle Alders⁵, Karlien L M Coene^3,6, Saskia N van der Crabben⁷, René G Feichtinger², Alejandro Garanto^1,8,9, Alex Hoischen¹⁰, Mirjam Langeveld¹¹, Dirk Lefeber^3,6,12, Johannes A Mayr², Charlotte W Ockeloen⁹, Holger Prokisch¹³, Richard Rodenburg¹⁴, Hans R Waterham^3,15, Ron A Wevers^3,6, Bart P C van de Warrenburg¹², Michel A A P Willemsen¹⁶, Nicole I Wolf¹⁷, Lisenka E L M Vissers⁴, Clara D M van Karnebeek^1,3,5,18.

Abstract

Exome sequencing (ES) in the clinical setting of inborn metabolic diseases (IMDs) has created tremendous improvement in achieving an accurate and timely molecular diagnosis for a greater number of patients, but it still leaves the majority of patients without a diagnosis. In parallel, (personalized) treatment strategies are increasingly available, but this requires the availability of a molecular diagnosis. IMDs comprise an expanding field with the ongoing identification of novel disease genes and the recognition of multiple inheritance patterns, mosaicism, variable penetrance, and expressivity for known disease genes. The analysis of trio ES is preferred over singleton ES as information on the allelic origin (paternal, maternal, "de novo") reduces the number of variants that require interpretation. All ES data and interpretation strategies should be exploited including CNV and mitochondrial DNA analysis. The constant advancements in available techniques and knowledge necessitate the close exchange of clinicians and molecular geneticists about genotypes and phenotypes, as well as knowledge of the challenges and pitfalls of ES to initiate proper further diagnostic steps. Functional analyses (transcriptomics, proteomics, and metabolomics) can be applied to characterize and validate the impact of identified variants, or to guide the genomic search for a diagnosis in unsolved cases. Future diagnostic techniques (genome sequencing [GS], optical genome mapping, long-read sequencing, and epigenetic profiling) will further enhance the diagnostic yield. We provide an overview of the challenges and limitations inherent to ES followed by an outline of solutions and a clinical checklist, focused on establishing a diagnosis to eventually achieve (personalized) treatment.

Entities: Chemical

Keywords: diagnostic yield; exome sequencing; exome-negative; genome sequencing; inborn metabolic disease; treatment

Mesh：

Substances：
DNA, Mitochondrial

Year: 2022 PMID： 35506430 PMCID： PMC9539960 DOI： 10.1002/jimd.12507

Source DB: PubMed Journal: J Inherit Metab Dis ISSN： 0141-8955 Impact factor: 4.750

We provide an overview of the challenges and limitations inherent to exome sequencing, followed by an outline of solutions and a clinical checklist, focused on establishing a diagnosis to eventually achieve (personalized) treatment in exome‐negative patients.

INTRODUCTION

Although each rare genetic disease (RGD) affects a small number of individuals, collectively they affect about 1% of all live births representing an important global disease burden. The total number of RGDs is still to be determined. Currently, OMIM (www.omim.org) lists 6281 disease‐gene associations, of which 1629 are inborn metabolic disease (IMD)s (www.iembase.org) (accession date 08.01.2022). Over the past 10 years, achieving an accurate and timely molecular diagnosis in the clinical setting of RGD has seen tremendous improvement with a prominent role for whole‐exome sequencing (ES; Figure 1). This also influenced the definition of IMDs. The presence of an abnormal metabolite is no longer a prerequisite for a disease to be labeled as an IMD. Classification of a disorder as an IMD requires only that impairment of specific enzymes or biochemical pathways is intrinsic to the pathomechanism.

FIGURE 1

Median diagnostic yield of next‐generation sequencing for different groups of rare genetic neurological disease. The bars display data from multiple studies (“n” is the total number of individuals) showing the range in diagnostic yields in light gray and the mean with a red line. , , , , , , , , In parallel, the development of personalized treatment options for RGD, and especially IMDs, has gained significant momentum. Both in vitro and in vivo studies increasingly identify treatment targets for a growing number of disorders, paving the way for (personalized) treatment opportunities either based on the pathomechanism (e.g., nutritional interventions) or aiming to directly target the genetic defect itself. , , , , , , , Assuming that treatment early in the course improves outcome, a timely diagnosis is key. Given the absence of known biomarkers in many recently molecularly defined RGD/IMDs, genetic (newborn) screening is the only way to the diagnosis. Digital knowledgebases such as Treatabolome and Treatable ID (www.treatable-id.org) aim to shorten the path to therapy, by providing (digital) access to treatment information for specific genes, variants, or (groups of) disorders. , , Treatment response has even been suggested as an addition to the American College of Medical Genetic (ACMG) criteria for variant interpretation. Here, we provide a diagnostic strategic outline for those patients with (suspected) RGD/IMDs in which ES did not provide a diagnosis (ES‐negative) to optimize the diagnostic yield, and thus therapeutic potential. We will focus on clinical presentations as neurodevelopmental disorders (NDDs)/intellectual disability, epilepsy, and movement disorders.

ES SUCCESS: CURRENT CLINICAL PRACTICE

Whereas data is generated for all protein‐coding sequences, interpretation strategies are often divided into a targeted interpretation strategy first based on (rare) variants in genes with an established association to the disease (virtual panel) or when investigating all known gene‐disease associations listed in OMIM (clinical exome). Subsequently, an exome‐wide analysis allows the identification of variants in candidate disease‐genes without prior disease association. Accordingly, clinicians are no longer limited to sequencing data of only one gene at a time, but have access to data from the coding regions of virtually all genes allowing a comprehensive diagnostic strategy and making ES the first‐tier diagnostic tool for RGD/IMDs. , , ,

Analysis of copy number variation and mitochondrial DNA

Although the annotation of point mutations (SNVs) and small insertions or deletions (indels) from ES data is standard, the annotation of copy number variation (CNV) is not. CNVs are a prevalent source of genetic variation that has been implicated in many genomic disorders, , resulting in the widespread application of genomic microarrays as a first‐tier diagnostic tool for CNV detection. More recently, it has been shown that the majority (88%) of disease‐relevant CNVs (>3 exons) can be detected in short‐read ES data via read‐depth analysis. , , , Moreover, in a cohort unselected for cytogenetic abnormalities, CNV analysis from ES outperformed chromosomal micro‐arrays when applied as a second‐line test. One has to bear in mind that genomic microarrays, single nucleotide polymorphism (SNP) chips, multiplex ligation‐dependent probe amplification (MLPA), and other technologies are superior for detecting certain types of CNVs. The full potential of ES is often not exploited as the additional analysis of “off‐target reads” (alternatively beads‐based enrichment of mtDNA) deriving from the mtDNA can be used to analyze both, nuclear and mitochondrial, genomes. Additional analysis of the mtDNA in existing ES data has been shown to increase the diagnostic yield by nearly 2% in an IMD cohort. It showed a high concordance (96.2%) and excellent precision (99.5%) when compared to the gold standard of targeted mtDNA next‐generation sequencing and is thus of sufficient quality for clinical diagnostics. One should, however, be aware of the tissue specificity of mtDNA alterations, and thus consider additional (targeted) next‐generation sequencing preferably in an affected tissue. Furthermore, mtDNA deletion(s) are usually not picked up by ES using DNA from blood since the proportion of deleted mtDNA is often too low. Again, in specific cases, ES cannot replace current techniques and this should be kept in mind.

CHALLENGES AND LIMITATIONS OF ES AND MITIGATION STRATEGIES

Data sharing and collaboration

The Matchmaker exchange network connects the data from six matchmaking initiatives (GeneMatcher, DECIPHER, PhenomeCentral, seqr, MyGene2, IRUD). , , , , , This creates a collective data set containing over 150 000 cases from more than 11 000 contributors for sharing of molecular and clinical data to establish gene‐disease associations for “newly” defined RGD. It is difficult to quantify the success that can be attributed to these initiatives; however, the number of matches (e.g., in 2019, 56% of gene entries in GeneMatcher had at least one match) and publications (e.g., in 2019, GeneMatcher was cited by 267 publications of which 77% were novel disease gene discoveries) facilitated by these platforms show their importance in gene discovery.

The need for continuous data (re)analysis and (reverse) phenotyping

Data interpretation is a continuously evolving discipline; with increased access to sequencing, requiring less clinical preselection, milder presentations of diseases are uncovered giving rise to the realization that all diseases have a phenotypic spectrum rather than one “classic” phenotype. Variants should not be discarded just because single disease‐feature are absent or unusual features are present. A detailed family history and collection of medical data from other (similarly) affected family members can be of added value. Disease manifestations may emerge or disappear over time and may not be present when an individual is investigated at a single time point during the course of the disease. The term “reverse phenotyping” (re‐evaluating the clinical findings of the individual in light of a potential pathogenic genotype) illustrates the ongoing urgency of exchange between the diagnostic laboratory and the referring physician. Reanalysis of ES data 1–3 years after the initial analysis may increase the diagnostic yield by 3%–15% and should make use of both the updated “phenotypical” input, as well as, the “genotypic” knowledge to be most successful. , , , At initial analysis, there may have been insufficient evidence for candidate variants or gene causality (Box 1). The American College of Medical Genetics (ACMG) standards and guidelines for the interpretation of sequence variants': consists of five classes (benign, likely benign, variant of uncertain significance [VUS], likely pathogenic or pathogenic), and lists functional tests as an important factor in variant interpretation. In line with the increasing pathomechanism‐based treatment options, we suggested to integrate “response to treatment” into the variant interpretation. An online tool for variant interpretation with individual adjustment can be found at https://wintervar.wglab.org/). The ClinVar database (https://www.ncbi.nlm.nih.gov/clinvar/) annually curates >10 000 disease variants. The Genome Aggregation Database (gnomAD, www.gnomad.org) contains the genomic data of more than 140 000 genomic data sets of “healthy/unaffected” individuals. , GeneMatcher (www.genematcher.org) is a matchmaking initiative where researchers interested in the same gene can get in touch. Numerous examples of successful matchmaking for identifying rare genetic disease can be found on the website listing > 400 publications referencing to this initiative. GeneReviews (https://www.ncbi.nlm.nih.gov/books/NBK1116/) comprehensively provide clinically relevant and medically actionable information (diagnosis, management, and genetic counseling) in a standardized format focused on the clinician (>800 chapters available). Genome Aggregation Database (gnomAD, www.gnomad.org): genomic data of more than 140 000 genomic datasets of healthy individuals. This database allows better interpretation of variants, especially when in‐house databases are small. The Leiden Open Variation Database (LOVD, www.lovd.nl) includes gene‐ and patient‐centered data containing >690 000 disease variants. This database allows better interpretation of variants, especially when in‐house databases are small. Online Mendelian Inheritance in Man (OMIM, www.omim.org) is the comprehensive, authoritative compendium of human genes and genetic phenotypes. Varsome (https://varsome.com/) is a variant knowledge community, data aggregator, and variant data discovery tool.

When reanalysis is not enough: when to consider a new ES?

The diagnostic yield is in part a result of the continuous development of bioinformatic tools to analyze and interpret ES data and constant updates of variant databases. At the same time, it is important to be aware of the data quality and completeness (coverage of the entire coding region of [clinically relevant] genes, as well as, sufficient read depth for the covered regions), especially of older ES data. For example, the Agilent V4 exome enrichment platform captured 89% of annotated genes on average while its follow‐up version V5 captured 95%. As will be discussed later, genome sequencing (GS) data are generated enrichment‐free and this aspect can overcome the limitation of ES in sufficiently covering coding exons, especially GC‐rich regions, as well as in characterizing structural variants. Depending on local resources, both a new ES on an up‐to‐date exome enrichment platform or ideally GS should be considered in a ES‐negative case after reanalysis.

DISCOVERING VARIANTS FROM ES DATA (BOX 2)

Multiple inheritance models

The three most common patterns of Mendelian inheritance are: autosomal dominant (AD), autosomal recessive (AR), and X‐linked. The mode of inheritance for human diseases can, however, be more complex (Figure 2). An increasing number of genes are identified that can be associated with an AD, as well as, an AR disorder and the co‐occurrence of different modes of inheritance is not restricted to a specific type of protein or clinical phenotype (Table S1). Hence, it is within reason to expect that this phenomenon is to be uncovered for more disease genes. This is another factor complicating the variant interpretation and emphasizes the importance of reverse phenotyping and need to continue the search for confirmative biomarkers and functional (biochemical) assays, especially for potentially treatable conditions.

FIGURE 2

Example pedigrees explaining different inheritance patterns. (A) Recessive inheritance models. Compound heterozygosity: the parents each carry a different variant in the same gene. The child receives the variant carrying allele from each parent and is affected. Consanguinity: the parents are related and carry the same heterozygous variant. XL‐recessive inheritance: females are carriers of the pathogenic trait while males are affected. (B) Gonadal mosaicism: variant occurs in gonadal cells of a healthy parent and the pathogenic variant is transmitted to the child. Parent‐post‐zygotic variant (PZM): the PZM occurs during the embryonic stage of the parent resulting in both gametes and somatic cells (soma) to carry the pathogenic variant, which is transmitted to the affected child. Child‐PZM: the variant occurs during embryogenesis of the child and results in multiple mosaic tissues. Somatic mosaicism: the pathogenic variant occurs post‐zygotically at a later stage during development affecting a single or limited number of tissues. (C) Complex inheritance models. Incomplete penetrance: all individuals in the family carry the pathogenic variant, but not all individuals manifest the disease phenotype. Variable expressivity: all individuals in the family carry the pathogenic variant, but the expression of the phenotype is variable. Multiple diseases recessive: both parents are heterozygous carriers of two pathogenic variants that are associated with two distinct disease phenotypes. Both pathogenic variants are transmitted to the child who presents the two distinct disorders. Multiple diseases dominant: both parents manifest a dominant disease and transmit the variant allele to the child who presents two distinct disorders. Pathogenic variant (m). Pathogenic variant on chromosome X (Xm). (D) Uniparental disomy. Complete isodisomy occurs when both copies of the chromosome originate from one parent and none from the other parent. Segmental isodisomy is when only a segment of the chromosome originates from one parent and the rest of the chromosome has two origins, one from mother and one from father. Heterodisomy refers to the situation in which both homologs from one parent are inherited by the child. The availability of ES data from multiple family members, parents (trio‐ES), or (affected) siblings, facilitates better interpretation of variants. For example, trio‐ES provides immediate data on the allelic origin of the variant and will allow for the detection of de novo variants (variants that are not inherited from either parent). In addition, deep phenotyping data next to ES of multiple family members allow for accurate segregation analysis and aid in figuring out complex modes of inheritance. Standardization of phenotypic terms (e.g., Human Phenotype Ontology [HPO]) is important to utilize phenotyping to its best, for example, when comparing patients from different institutes. However, it needs to be reminded that when parents are only mildly affected and not recognized as affected individuals, inherited genetic variants will be filtered out in AD inherited disorders and therefore lost in analysis. Similarly, when a variant occurs as a mosaic in one of the parents, the analysis should be adapted accordingly (Box 2). 3.1 Multiple inheritance models: ‐ disorders of organelle biogenesis, dynamics, and interactions (OPA1 and OPA3 55, 56 ); disorders of complex I subunits and assembly factors (DNAJC30 ); disorders of mtDNA replication and maintenance (TWNK 58, 59 ), other disorders of mitochondrial function (AFG3L2,60, 61, 62 CLPB, , and ATAD3A ); disorders of organelle interplay (EMC1 ); disorders of mitochondrial and peroxisomal dynamics (GDAP1 67, 68 ); disorders of the synaptic vesicle cycle (PRRT2 and KIF1A 70, 71 ); disorders of non‐mitochondrial tRNA processing and aminoacyl‐tRNA synthetases (NSUN2 ); and disorders of carbohydrate transmembrane transport and absorption (GLUT1 ). 3.2 X‐linked IMDs: ‐ disorders of pyruvate metabolism (PDHA1 ); disorders of mitochondrial membrane biogenesis and remodeling (TAZ ); disorders of peroxisomal fatty acid oxidation (ABCD1 ); and disorders of sphingolipid degradation (GLA ), urea cycle disorders and inherited hyperammonemias (OTC ). 3.3 Mosaicism: ‐ disorders of mitochondrial metabolite repair (IDH1, IDH2 ); disorders of sphingolipid degradation (GLA ), urea cycle disorders, and inherited hyperammonemias (OTC ); disorders of carbohydrate transmembrane transport and absorption (GLUT1 ); and disorders of niacin and NAD metabolism (NAXD ). 3.4 Variable penetrance and expressivity (reviewed in ref. ): ‐ general: copper‐transporting ATPase subunit beta deficiency (ATP7B), mtDNA‐related disorders (heteroplasmy, threshold), gender‐related penetrance: disorders of tetrahydrobiopterin metabolism (GCH1), age‐related penetrance: disorders of the synaptic vesicle cycle (LRRK2), ethnicity related penetrance: disorders of the synaptic vesicle cycle (LRRK2). 3.5 Multiple diseases in one individual or one family: ‐ disorders of phenylalanine and tyrosine metabolism (PAH) and disorders of glycogen metabolism (SLC37A4) (personal experience), disorders of glycolysis (PCK1), and disorders of glutamate (GRIN2B). 4.1 Intronic, regulatory and splice‐site variants (reviewed in ref. ): ‐ exon skipping: disorders of mitochondrial fatty acid oxidation (ACADM), disorders of mitochondrial nucleotide pool maintenance (MPV17), gamma‐aminobutyric acid neurotransmitter disorders (SSADH), disorders of sphingolipid degradation (GLA), and disorders of mitochondrial membrane biogenesis and remodeling (SERAC1). ‐ poisonous exon: disorders of complex I subunits and assembly factors (TIMMDC1 ). 4.2 Complex DNA rearrangements: ‐ short tandem repeat expansion: disorders of glutamate/glutamine and aspartate/asparagine metabolism (GLS ). 6. Epigenetics: ‐ compound epigenetic‐genetic heterozygosity: disorders of cobalamin metabolism (MMACHC/PRDX1, “epi‐CLBc”). ‐ uniparental disomy (UPD): disorders of glycerolipid metabolism (ABHD5 )*, disorders of amino acid transport (SLC7A7 ).* (*in both cases, UPD led to homozygosity of a variant found in heterozygous state in parents, not to an epigenetic effect).

X‐linked disorders

X‐inactivation is a well‐established, physiological, dosage compensation mechanism ensuring that X‐chromosomal genes are expressed at comparable levels in males and females. The phenotypic range in XL‐disorders is broad. Certain XL‐disorders are incompatible with life in males and therefore only affected females are seen, while others present with more severely affected males compared to females. In addition, the onset of disease in females can be (much) later in life and hence mothers of severely affected males can be without medical concerns at the time of ES. In some cases, skewed X‐inactivation can occur, meaning that the X‐inactivation of one X‐chromosome is favored over the other (here: the X chromosome with the mutation), and explain the phenomena seen.

Mosaicism

De novo pathogenic variants (DNMs) can occur during gametogenesis, as well as, the post‐zygotic phase resulting in mosaicism (Figure 2B). Post‐zygotic variants (PZMs) leading to disease often occur during early embryogenesis in the first 15 mitotic divisions. Mosaic variants may be inherited from a parent (gonadal or somatic mosaic) or can arise in the child, which influences the recurrence risk and is therefore important to know for adequate counseling. , In an epilepsy disorder cohort 3.5% of the molecular diagnoses could be attributed to pathogenic mosaic variants and a study of 4293 patients with NDD showed that mosaicism was present in 40 cases, of which in 16 the variant was of parental origin and in 24 cases had occurred de novo in the child. , Upon detection or suspicion of a mosaic variant in the affected individual, it is important to perform sensitive testing in the parents to determine their genetic profile as this will affect counseling. The recurrence risk that parents have in a new pregnancy can roughly be divided into three groups. A moderate to high risk of up to 50% in case of parent‐PZM (~1%–2% of DNM) depending on the level of parental mosaicism. Low risk, 1%, when gonadal mosaicism is at play (~95% of DNM) and a negligible risk when the DNM occurs in the child (~3% of DNM). Post‐zygotic de novo events in the child (child‐PZM) that result in mosaicism may contribute to the variable expressivity of single‐gene disorders through gene‐dose effect or by acting in addition with other (epi)genetic factors. Somatic pathogenic variants, already known to be involved in cancer development, may also play a role in neurodevelopmental diseases, especially in patients with brain malformations. , Pathogenic somatic DNM affecting the brain may remain undetected in ES when the gene is not expressed in blood and require sequencing of other tissues, for example, fibroblasts or buccal swabs. Furthermore, the detection of mosaicism, in general, requires deep sequencing (preferably >100× coverage) which is not always achieved by ES. In addition, the analysis strategy must be adapted to enable the detection of low‐level variants.

Variable penetrance and expressivity

“Penetrance” of a monogenic disorder describes the conditional probability that an individual carrying a pathogenic variant manifests the disease phenotype (Figure 2C). If this probability does not equal 100% within a specific age range, the disorder displays incomplete penetrance. Hence, some AD disorders occasionally appear to skip a generation, meaning that individuals carrying a pathogenic variant do not express the disease phenotype (asymptomatic carriers) but can transmit the mutant allele to the offspring (reviewed in Mangrinelli et al. ). It should be borne in mind that when using trio‐ES analysis, these variants can be missed. AR, XL, and mtDNA‐linked disorders can also exhibit incomplete penetrance. Among others, age, gender, and ethnicity are drivers of incomplete penetrance in certain diseases. Although it goes beyond the scope of this article, we would like to alert that for mtDNA‐related issues like heteroplasmy, threshold and the chosen tissue have to be taken into account as well. “Expressivity” is the extent to which a given genotype is expressed at the phenotypic level (Figure 2C). When the same genetic variant is expressed, it can show quantitatively different effects among distinct individuals, even among members of the same family (intrafamilial variability). Variable expressivity is recognized in monogenic disorders with all patterns of inheritance and represents a major contributor to phenotypic heterogeneity. Finally, genetic modifiers may play a role not only in the severity of the disease, but also in whether a carrier will manifest the disease. These modifiers usually segregate independently from the disease‐causing variant, and are currently investigated for several inherited diseases to better understand the disease pathophysiology (and will need to be included in the diagnostics of specific genes in the future).

Multiple diseases in one individual or one family

It may not always be clear whether there is one or multiple (hereditary) diseases running in one family or, even more complicated, in one individual (Figure 2C). The occurrence of multiple genetic diagnoses underlying a rare disease phenotype in an individual is estimated about 5% and most often occurs in consanguineous families. , To maximize the chance of identifying multiple genetic diagnoses and their underlying molecular defects, ES should be extended to more family members beyond trio‐ES. One should not forget that family members may be hesitant to participate, fearing an unexpected genetic diagnosis especially in the absence of medical complaints. A diagnosis of multiple disorders obviously has implications for genetic counseling and patient management. Also, it is essential for our insights into rare diseases: blended phenotypes caused by multiple disorders should not be mistaken for an extended phenotype caused by a single disorder. In addition, patients may develop more common disorders during life, such as type II diabetes or cancer, that are not directly related to their genetic disorders.

Digenic disease/synergistic heterozygosity, polygenic risk scores

Different inheritance models, such as digenic disease and synergistic heterozygosity, are being explored in which not a single gene, but variants in multiple genes are causative for a genetic disorder in an individual. , Furthermore, the genetic burden (amount of genetic variants in disease‐associated genes) that individuals carry can be expressed in a so‐called polygenic risk score. The exact role that these models play in the origin of RGDs/IMDs is still under investigation and not yet ready for diagnostic care.

TECHNICAL LIMITATIONS OF ES

Intronic, regulatory, and splice site variants

Roughly 1%–2% of our DNA is protein‐coding and referred to as the exome. The majority of disease‐causing variants that we currently know are located in this “small” coding section of our DNA. The remainder is called non‐coding DNA which consists of intergenic (between genes) and intronic (between the exons of a gene) DNA. The exact function of most of the non‐coding DNA remains to be elucidated, however, for certain evolutionary conserved regions, that is, promotor sites, (non) canonical splice sites, untranslated regions (UTRs), and long non‐coding RNAs (lncRNA), the importance in causing disease has been shown. , , The essential role of UTRs and introns in gene regulation, and thereby disease association, is becoming more apparent. Variants in the splice site regions (−3 bases and +8 bases of each exon) are known to often cause aberrant transcripts. Moreover, in recent years the essential role of UTRs and deep‐intronic sequences in gene regulation has become evident. Deep‐intronic variants, as well as, variants in the UTR can influence gene regulation by affecting the binding site for regulatory proteins or by altering the gene transcript. Such variants can, for example, introduce a new start codon in the 5′ UTR or cause the inclusion of a pseudoexon that can in turn lead to an out‐of‐frame transcript that may be dysfunctional (Figure S1). A recent publication showed the significant contribution of pathogenic variants in the 5′ UTR region of MEF2C, emphasizing the importance of screening the 5′ UTR of disease‐relevant genes. Another element that should be considered during genome analysis are variants surrounding the naturally occurring poison exons, which are exons that contain a termination codon and cause degradation of the transcript via nonsense‐mediated decay when included. Several publications have now shown the impact of poison exons in NDD. Furthermore, intergenic DNA contains non‐protein‐coding DNA but sequences important for transcriptional and translational regulation of protein‐coding sequences (e.g., enhancers) and can encode for RNA genes. These RNA genes in turn play a role in transcriptional and translational processes (Figure S1). More and more examples arise showing the pathogenic effect of variants in non‐protein‐coding DNA on human biology. , During embryogenesis, precise gene transcription in space and time requires that distal enhancers and promoters communicate by physical proximity within gene regulatory landscapes. To achieve this, regulatory landscapes fold in the nuclear space, creating complex 3D structures that influence enhancer‐promoter communication and gene expression and that, when disrupted, can cause disease. RNA‐sequencing aids in estimating the effect of a variant on transcription, but each non‐coding DNA variant requires adequate functional testing before a final molecular diagnosis can be made.

Complex DNA rearrangements

Disease‐causing variants are not limited to SNVs or CNVs, but include structural variants (SVs) that can result in complex rearrangements of our DNA. SVs are large genomic alterations that include DNA duplications, repeats, deletions, insertions, inversions, and translocations. CNVs are a subtype of SVs that are not addressed in this paragraph. Standard ES analysis is based on short sequence reads that are aligned to the best matching position in the reference genome. In some cases, for example, in case of a translocation or a transposable element, the mapped position does not resemble the true genomic position in the patient. Mobile or transposable elements are DNA sequences that move around or change the number of copies in the genome. Mobile elements can be inserted within a gene, causing a frameshift, or nearby affecting splicing or gene regulation. A switch from short‐read sequencing (SRS, 150–300 bp reads) to long‐read sequencing (LRS, >10 kb reads) allows for better sequence mapping and therefore better SV detection, including complex rearrangements and repetitive sequences. A recent publication comparing SRS to LRS showed that 80% of LRS SV calls were not identified by SRS. Short‐tandem repeats (STRs) are another type of SV that are missed in ES analysis. STR variants and mobile elements can also be detected by ES when specific bio‐informatic tools are used, which are not yet standard in diagnostic settings. , STR expansions are often associated with neurologic disease, for example, Fragile X syndrome, Huntington's disease and several hereditary ataxias, and will likely be the underlying genetic pathomechanism in subgroups of unsolved patients (e.g., glutaminase deficiency ). Another suitable new technique is optical genome mapping based on high‐resolution genome imaging instead of sequencing. In short, long linear single DNA molecules are labeled at specific sites creating a whole‐genome image of which patient patterns can be compared to the expected labeling pattern of the reference genome. This technique proves to be especially suitable for large and complex rearrangements (>500 bp), and also shows high sensitivity to somatic SVs, but can also detect STR expansions. , , This technique is complementary to sequencing, and rather a potential replacement of all cytogenetic technologies, such as karyotyping, FISH and CNV‐microarrays. Altogether, the detection of SVs can be improved by moving from ES to GS since inclusion of the non‐coding parts will improve sequence read mapping. However, important to note is that GS files are considerably larger and contain more variants that make the interpretation of the data more challenging. Currently, the analysis pipeline for GS data is ready for clinical interpretation when it comes to the coding region of our DNA, but it will take time, research, and experience to be able to clinically interpret the non‐coding region.

THE ADDED VALUE OF FUNCTIONAL DATA

Transcriptomics (RNA‐seq) and proteomics

For the accurate interpretation of variants, DNA sequence data alone may not always suffice. In some cases, transcriptomics is performed before or simultaneous with ES to highlight those genes that show an aberrant transcription profile. It is important to note that the choice of tissue used for transcriptomic studies could impact the results as not all genes are equally relevant and expressed in every tissue. , RNA analysis may also be used to study up‐ or downregulated genes or pathways just as proteomics can provide insights into protein interaction networks. The effect of missense variants is not always straightforward as they may lead to aberrant splicing, incorrect folding of the protein, or abnormal expression levels. The latter could be a result of allelic expression imbalance (AEI), where the mutant allele is higher expressed than the wildtype allele causing a seemingly dominant presentation (heterozygous variant) of a recessive disorder. ES (or preferably GS) to identify intronic and regulatory variants, in combination with quantitative RNA studies are required to detect AEI. Again other techniques can aid in confirming pathogenicity. It has been shown that additional RNA‐seq can enhance the diagnostic yield by 10%–16% compared to ES alone and can be integrated at reasonable costs and turnaround times. , The diagnostic value of integrating proteomic data with genomic, transcriptomic, and phenotypic data from 145 individuals has been shown in a study on individuals suspected of having a mitochondrial disease and achieved genetic resolution of 21% in ES‐negative cases. Although RNA‐seq still focuses on the genetic level, additional functional analyses can help in variant interpretation, especially for IMD. As for genetic analyses, these ‐omics techniques rely on finding “outliers” and need comparison with a control cohort of sufficient size.

Metabolomics

Targeted metabolic screenings, covering a specific class of metabolites such as amino acids, organic acids, or acylcarnitines, have already been used for decades in diagnostics for IMD, providing a variant‐transcending read‐out of biochemical function of a metabolic pathway. Currently, untargeted metabolomics is replacing targeted metabolic screening, moving from multiple targeted analyses performed in parallel, to a “one‐size‐fits most” ‐omics approach (Next Generation Metabolic Screening [NGMS]; validated for use in clinical diagnostics under ISO:15189 accreditation). , Furthermore, NGMS offers the possibility to find novel biomarkers in (neuro) IMDs that in some cases have been found to play an essential role in the pathophysiology of the disease. , , Untargeted metabolomics holds the potential to serve as a functional counterpart to ES data, as it provides unbiased functional data that may not otherwise be available from targeted biochemical assays. , , Besides the additional functional evidence to determine the pathogenicity of variant of unknown significance (VUS) in known IMD genes, untargeted metabolomics may detect aberrant metabolite profiles that highlight certain pathways and may lead to the discovery of novel disease‐gene associations (Figure 3). This approach specifically helps to elucidate the pathogenicity of variants in genes that encode for enzymes or transporters. A recent study with 170 individuals presenting predominantly with neurological symptoms showed that metabolomics data contributed to the variant interpretation in 73 different IMD genes in 43% of investigated cases. Importantly, untargeted metabolomics can increase our mechanistic understanding of the metabolic pathways involved, as well as identify novel biomarkers and treatment targets (Box 3). Until now, untargeted metabolomics has mostly been applied to body fluid analysis and as such will only evaluate metabolites that will be able to cross the cell outer membrane. The technique surely will be able to gain momentum when studies will start targeting the intracellular metabolome of various cell types.

FIGURE 3

Added value of genomics and metabolomics data integration. Exome or genome sequencing (genomics) provides data on variants found in the DNA of a patient, whereas targeted or untargeted metabolomics provides data on aberrant metabolites. The integration of both data sets may aid in the interpretation of variants. For example, the metabolic profile of a patient showed an elevated level of substrate 1 and a decrease of products 1 and 2, implicating that there is a defect in pathway one that in turn points to gene A with a VUS. VUS, variant of unknown significance. Transcriptomics: disorders of complex I subunits and assembly factors (TIMMDC1 ). Metabolomics: disorders of sialic acid metabolism (NANS ), disorders of pyridoxine metabolism (ALDH7A1 135, 136). Glycomics: disorders of N‐linked protein glycosylation (GFUS ). Lipidomics: disorders of mitochondrial membrane biogenesis and remodeling (SERAC1 ); disorders of phosphatidylcholine, phosphatidylserine, and phosphatidylethanolamine metabolism (PCYT2 ); disorders of fatty acyl synthesis, elongation, and recycling (ALDH3A2 ); and disorders of peroxisomal fatty acid oxidation (reviewed in ref. ). Complexomics: other disorders of mitochondrial function (CLPB ).

Other ‐omics

Although proteomics and metabolomics are quite universal approaches, the utility and yield of other ‐omics techniques (e.g., lipidomics and glycomics) , , require further evaluation. Fluxomics using stable isotope labeling, to study the flux of metabolites through a metabolic pathways, is a relatively new technique. It has the capacity to analyze entire metabolic pathways in vitro and in vivo in a dynamic fashion, useful especially for IMDs to delineate the reprogrammed metabolism due to a specific enzymatic or other biochemical defect. In summary, (targeted) metabolomics, RNA‐seq, and proteomics are the most established ‐omics technologies alongside ES/GS and “other ‐omics” have mainly been used to validate or reject single candidate variants or to study novel gene‐disease phenotypes (Box 3). Evidence for their utility in terms of pathophysiologic delineation, prognostication, and therapeutic effect monitoring is clearly emerging. ,

EPIGENETICS

Besides changes in the DNA sequence itself, epigenetic changes can cause disease. Modifications of DNA or histones regulate gene expression by remodeling the chromatin structure, making DNA accessible or inaccessible for the transcriptional machinery. One of these so‐called epigenetic modifications is methylation of cytosines at CpG dinucleotides in the DNA molecule. Methylation of the promoter region of a gene is generally associated with gene silencing. Epigenetic changes are thought to be caused by stochastic errors in the establishment or maintenance of the epigenome or induced by underlying variations in DNA sequence. Changes in methylation status of the DNA are not detectable by ES.

Targeted or genome‐wide analysis of promoter methylation

The best known example of promoter hypermethylation associated with gene silencing is Fragile X syndrome. A repeat expansion in the promoter region of FRM1 causes hypermethylation and loss of expression of that allele. A small number of other examples exist where methylation analysis revealed the disease‐causing mechanism while the underlying genetic defects (repeat expansions, variants in non‐coding regions, or in neighboring genes) are usually not detected by ES. Methylation analysis, either targeted or genome‐wide, will detect these epigenetic defects regardless of the underlying genetic defect. Only few examples are known today. A study of 489 patients with NDD and congenital anomalies without an identified cause showed an increase in de novo epigenetic aberrations in patients compared to controls. Causality needs to be established, but this indicates that epigenetic variations may play a substantial role in the etiology of NDD. Recently “epi‐cblC” has been described as a first example of an IMD with compound epigenetic‐genetic heterozygosity. Affected individuals are compound heterozygous for a genetic variant and an epimutation at the MMACHC locus, which is secondary to a splicing variant at the adjacent PRDX1 gene. Both these variants cause aberrant antisense transcription and cis‐hypermethylation of the MMACHC gene promotor with subsequent silencing.

Methylation status of imprinted genes

Genomic imprinting describes the phenomenon that the expression of a gene is dependent on the parental origin of the gene. In humans, already 100 genes are known to be subject to imprinting and are expressed only from the paternal or the maternal allele. Imprinted genes tend to be organized in clusters under the control of imprinting centers. These imprinting centers are differentially methylated, that is, methylation is present only on one of the two alleles. The most well‐known clusters are those on 11p15, involved in Beckwith–Wiedemann/Silver–Russel syndrome, and on 15q13, involved in Prader–Willi/Angelman syndrome. Disturbances of imprinting result in overexpression or loss of expression of imprinted genes. This can be due to loss of one allele, uniparental disomy (UPD), or imprinting defects (often primary epigenetic defects). UPD occurs when both copies of a chromosome originate from one parent (Figure 2D). This can involve the entire chromosome (complete isodisomy or heterodisomy) or only a small segment (segmental disomy). As a consequence, UPD may result in aberrant dosage of genes regulated by genomic imprinting or homozygosity of a recessive mutation (reviewed in ). Targeted methylation analysis of imprinted regions to detect imprinting disorders is usually done by methylation‐specific multiplex ligation‐dependent probe amplification (MLPA) assays, but can also be detected with genome‐wide methylation analysis. ,

Methylation signatures

In addition to methylation defects at specific loci, variants in genes encoding proteins involved in epigenetic regulation may cause a broader, genome‐wide aberrant methylation profile. Patients with variants in such genes exhibit DNA methylation “episignatures” that are detectable in peripheral blood and have shown to be highly sensitive and specific for specific disorder. , Currently, a characteristic methylation signature has been recognized for over 40 syndromes, in association with more than 60 genes. These signatures can be used as biomarkers for ES‐negative patients or for variant interpretation. Epigenetic profiling is available for diagnostic care. From a cohort of 207 affected individuals, ~28% were positive for an episignature that supported the diagnosis of the associated syndrome. The number of identified disorder‐ or gene‐specific epigenetic signatures is growing rapidly, increasing the diagnostic power and yield of this type of test. In addition to genetic variations, environment may impact the epigenome. Prenatal exposure to, for instance, maternal smoking, alcohol consumption, or famine affects the health of the child and is associated with changes in the genome‐wide methylation profile. , , Especially fetal alcohol spectrum disorder, a non‐inherited mimicker of genetic NDD, seems to be associated with specific epigenetic profiles, which is promising for future diagnostic use to discriminate between the two disorders.

The genomic checklist

“The modern world has given us stupendous know‐how. Yet avoidable failures continue to plague us in health care […]—in almost every realm of organized activity. And the reason is simple: the volume and complexity of knowledge today has exceeded our ability as individuals to properly deliver it to people—consistently, correctly, safely. We train longer, specialize more, use ever‐advancing technologies, and still we fail.” Atul Gawande makes this compelling argument that we can do better, using the simplest of methods: the checklist. For surgical safety and on intensive care units, World Health Organization checklists have been adopted worldwide as a standard for care, decreasing errors and improving success. , , Analogous to these, we propose the genomic checklist (Figure 4), with the goal of standardizing analysis and interpretation methods, to enhance diagnostic success as well as minimize delay and costs.

FIGURE 4

Genomic stepwise checklist to solve exome negative neurological cases. This checklist can be used to standardize analysis and interpretation methods to improve diagnostic care.

CONCLUSIONS

We have shown that a “negative” ES should not be considered the end of the road in the quest for a diagnosis in patients suspected of having an RGD. Rather the contrary, there is more to explore still.

AUTHOR CONTRIBUTIONS

Saskia B. Wortmann, Machteld M. Oud, Lisenka E. L. M. Vissers, and Clara D. M. van Karnebeek conceptualized the study; Saskia B. Wortmann and Machteld M. Oud drafted the manuscript; all other authors revised it critically for important intellectual content.

CONFLICTS OF INTEREST

The authors declare no conflicts of interest.

ETHICS STATEMENT

Not applicable. Appendix S1 Supporting information Click here for additional data file.

155 in total

1. Detection of clinically relevant copy number variants with whole-exome sequencing.

Authors: Joep de Ligt; Philip M Boone; Rolph Pfundt; Lisenka E L M Vissers; Todd Richmond; Joel Geoghegan; Kathleen O'Moore; Nicole de Leeuw; Christine Shaw; Han G Brunner; James R Lupski; Joris A Veltman; Jayne Y Hehir-Kwa
Journal: Hum Mutat Date: 2013-08-30 Impact factor: 4.878

Review 2. Structural variation in the human genome and its role in disease.

Authors: Paweł Stankiewicz; James R Lupski
Journal: Annu Rev Med Date: 2010 Impact factor: 13.739

3. Recurrent De Novo and Biallelic Variation of ATAD3A, Encoding a Mitochondrial Membrane Protein, Results in Distinct Neurological Syndromes.

Authors: Tamar Harel; Wan Hee Yoon; Caterina Garone; Shen Gu; Zeynep Coban-Akdemir; Mohammad K Eldomery; Jennifer E Posey; Shalini N Jhangiani; Jill A Rosenfeld; Megan T Cho; Stephanie Fox; Marjorie Withers; Stephanie M Brooks; Theodore Chiang; Lita Duraine; Serkan Erdin; Bo Yuan; Yunru Shao; Elie Moussallem; Costanza Lamperti; Maria A Donati; Joshua D Smith; Heather M McLaughlin; Christine M Eng; Magdalena Walkiewicz; Fan Xia; Tommaso Pippucci; Pamela Magini; Marco Seri; Massimo Zeviani; Michio Hirano; Jill V Hunter; Myriam Srour; Stefano Zanigni; Richard Alan Lewis; Donna M Muzny; Timothy E Lotze; Eric Boerwinkle; Richard A Gibbs; Scott E Hickey; Brett H Graham; Yaping Yang; Daniela Buhas; Donna M Martin; Lorraine Potocki; Claudio Graziano; Hugo J Bellen; James R Lupski
Journal: Am J Hum Genet Date: 2016-09-15 Impact factor: 11.025

4. Clinical glycomics in the diagnostic laboratory.

Authors: Merel A Post; Dirk J Lefeber
Journal: Ann Transl Med Date: 2019-09

5. Reducing surgical mortality in Scotland by use of the WHO Surgical Safety Checklist.

Authors: G Ramsay; A B Haynes; S R Lipsitz; I Solsky; J Leitch; A A Gawande; M Kumar
Journal: Br J Surg Date: 2019-04-16 Impact factor: 6.939

6. DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources.

Authors: Helen V Firth; Shola M Richards; A Paul Bevan; Stephen Clayton; Manuel Corpas; Diana Rajan; Steven Van Vooren; Yves Moreau; Roger M Pettett; Nigel P Carter
Journal: Am J Hum Genet Date: 2009-04-02 Impact factor: 11.025

7. The role of clinical response to treatment in determining pathogenicity of genomic variants.

Authors: Clara D M van Karnebeek; Madhuri R Hegde; Joseph J Shen; Saskia B Wortmann; Lonneke de Boer; Leo A J Kluijtmans; Marleen C D G Huigen; Johannes Koch; Stephanie Ross; Christin D Collins; Robin van der Lee
Journal: Genet Med Date: 2020-10-22 Impact factor: 8.822

8. Meta-analysis and multidisciplinary consensus statement: exome sequencing is a first-tier clinical diagnostic test for individuals with neurodevelopmental disorders.

Authors: Siddharth Srivastava; Jamie A Love-Nichols; Kira A Dies; David H Ledbetter; Christa L Martin; Wendy K Chung; Helen V Firth; Thomas Frazier; Robin L Hansen; Lisa Prock; Han Brunner; Ny Hoang; Stephen W Scherer; Mustafa Sahin; David T Miller
Journal: Genet Med Date: 2019-06-11 Impact factor: 8.822

Review 9. Unraveling the unknown areas of the human metabolome: the role of infrared ion spectroscopy.

Authors: Jonathan Martens; Giel Berden; Herman Bentlage; Karlien L M Coene; Udo F Engelke; David Wishart; Monique van Scherpenzeel; Leo A J Kluijtmans; Ron A Wevers; Jos Oomens
Journal: J Inherit Metab Dis Date: 2018-03-19 Impact factor: 4.982

1 in total

Review 1. How to proceed after "negative" exome: A review on genetic diagnostics, limitations, challenges, and emerging new multiomics techniques.

Authors: Saskia B Wortmann; Machteld M Oud; Mariëlle Alders; Karlien L M Coene; Saskia N van der Crabben; René G Feichtinger; Alejandro Garanto; Alex Hoischen; Mirjam Langeveld; Dirk Lefeber; Johannes A Mayr; Charlotte W Ockeloen; Holger Prokisch; Richard Rodenburg; Hans R Waterham; Ron A Wevers; Bart P C van de Warrenburg; Michel A A P Willemsen; Nicole I Wolf; Lisenka E L M Vissers; Clara D M van Karnebeek
Journal: J Inherit Metab Dis Date: 2022-05-22 Impact factor: 4.750

1 in total