Literature DB >> 29323667

Making new genetic diagnoses with old data: iterative reanalysis and reporting from genome-wide data in 1,133 families with developmental disorders.

Caroline F Wright^1,2, Jeremy F McRae³, Stephen Clayton³, Giuseppe Gallone³, Stuart Aitken⁴, Tomas W FitzGerald³, Philip Jones³, Elena Prigmore³, Diana Rajan³, Jenny Lord³, Alejandro Sifrim³, Rosemary Kelsell³, Michael J Parker⁵, Jeffrey C Barrett³, Matthew E Hurles³, David R FitzPatrick⁵, Helen V Firth^3,6.

Abstract

PURPOSE: Given the rapid pace of discovery in rare disease genomics, it is likely that improvements in diagnostic yield can be made by systematically reanalyzing previously generated genomic sequence data in light of new knowledge.
METHODS: We tested this hypothesis in the United Kingdom-wide Deciphering Developmental Disorders study, where in 2014 we reported a diagnostic yield of 27% through whole-exome sequencing of 1,133 children with severe developmental disorders and their parents. We reanalyzed existing data using improved variant calling methodologies, novel variant detection algorithms, updated variant annotation, evidence-based filtering strategies, and newly discovered disease-associated genes.
RESULTS: We are now able to diagnose an additional 182 individuals, taking our overall diagnostic yield to 454/1,133 (40%), and another 43 (4%) have a finding of uncertain clinical significance. The majority of these new diagnoses are due to novel developmental disorder-associated genes discovered since our original publication.
CONCLUSION: This study highlights the importance of coupling large-scale research with clinical practice, and of discussing the possibility of iterative reanalysis and recontact with patients and health professionals at an early stage. We estimate that implementing parent-offspring whole-exome sequencing as a first-line diagnostic test for developmental disorders would diagnose >50% of patients.

Entities: Chemical

Keywords: diagnostic yield; exome sequencing; reanalysis; reclassification; recontact

Mesh：

Year: 2018 PMID： 29323667 PMCID： PMC5912505 DOI： 10.1038/gim.2017.246

Source DB: PubMed Journal: Genet Med ISSN： 1098-3600 Impact factor: 8.822

Introduction

The relative affordability and accessibility of next generation sequencing (NGS) has facilitated the development of family-based genomic analysis, resulting in an explosion of gene discovery and diagnosis for rare diseases.1–3 Diagnosis rates – here defined as the confident causal association of a genotype with the presenting phenotype – vary from 20-60% depending upon numerous factors, including specificity of the clinical presentation, genetic heterogeneity of the disease, patient recruitment criteria, sequencing technology and analytical workflow, evidence of de novo occurrence of causal variants, and date of publication.4–6 The latter in part reflects the accelerated rate of analytical tool development and gene discovery catalysed by NGS.7 Given the pace of change throughout the field, some diagnostic variants must be presumed to be unrecognised during the initial analysis of genomic data, and without intervention, may remain undiscovered. Systematic, retrospective reanalysis of genomic data is therefore likely to improve diagnostic yield.8 However, the logistical challenges of performing regular reanalyses, coupled with re-interpretation of the results and re-contacting of clinicians and patients, are substantial.9 To date, although several small-scale examples of this approach exist,10,11 no large-scale diagnostic reanalyses have been published, so the potential benefits of this methodology when applied systematically across an entire cohort are currently unquantified. Due to the extremely large number of variants in every genome, evidence-based filters are applied to prioritize potentially relevant variants for individual clinical cases. A balance must be struck between sensitivity and specificity in order to find potential diagnoses without being overwhelmed by false positive results. As a result, there are numerous reasons why diagnostic variants might not be recognised during the analysis of genomic data; technical failure to detect a variant in the data, incorrect annotation, limited knowledge of the causative loci, or inappropriate exclusion of a variant (Table 1).10 It is therefore beholden upon researchers involved in large-scale translational research studies to consider re-evaluating their protocols and reanalysing their data, and also on clinical services to consider how re-interpreting data, reclassifying variants and re-contacting patients can best be managed.

Table 1

Potential analytical sources of missed diagnoses and corresponding improvements made to the DDD workflow since 2014.

Step	Purpose	Potential sources of missed diagnoses	Changes to DDD workflow
Variant detection	Sequence data is mapped to the human genome reference, and variation called relative to that reference	Low depth sequence data Incorrect reference sequence Incorrect mapping Variant detection/genotyping failed Variant class not considered (e.g. triplet repeats)	Updated versions of BWA, SAMtools, GATK and DeNovoGear Multi-sample variant calling Additional variant detection algorithms
Variant annotation and filtering	Stringent filters are applied to exclude low quality, common and non-coding variants that are unlikely to be clinically relevant	Low quality variant discarded Incorrect annotation of allele frequency Incorrect annotation of consequence Variant filtering thresholds too stringent	Updated version of VEP Updated MAF data Updated filtering thresholds (lower MAF, exclusion of benign inherited missense variants)
Gene prioritization	Evidence-based, disease-specific ‘virtual’ gene panels are applied to limit variants to those with a relevant genotype (heterozygous/homozygous) and inheritance (dominant/recessive) in proven disease-causing genes	Incorrect disease mechanism Incorrect inheritance or family history Incomplete penetrance Phenotype not recorded Known gene missing from panel Causal gene not yet discovered	Updated DDG2P (November 2013 freeze used previously; June 2016 freeze used here, including 286 additional genes) Plausibly pathogenic variants shared via DECIPHER Research Track Reviewed parental phenotypes
Clinical assessment	Clinical assessment of the pathogenicity and contribution of specific variants to disease in a specific individual/family	Patient phenotype differs from previously published cases Phenotype not yet developed Evidence for pathogenicity is unclear	Candidate variants re-reviewed by core DDD clinical team and/or referring clinician Some patients clinically assessed again

BWA=Burrows-Wheeler Aligner. GATK=Genome Analysis Toolkit. MAF=Minor Allele Frequency. VEP=Variant Effect Predictor. DDG2P=Developmental Disorder Gene-to-Phenotype database.

The Deciphering Developmental Disorders (DDD) Study (www.ddduk.org) provides an ideal cohort for developing and testing how such an iterative model of reanalysis and re-reporting might work at scale. The DDD Study is a UK-wide collaboration, between the National Health Service (NHS) Regional Genetics Services across the UK and Ireland and the Wellcome Trust Sanger Institute, which aims to both delineate the genetic architecture of developmental disorders and improve the diagnosis of these disorders in clinical practice using high-throughput genetic technologies. From April 2011–2015, the DDD Study recruited ~13 500 families with severe, undiagnosed developmental disorders, including ~10 000 complete parent-offspring trios, all of whom have had all known coding genes sequenced (exome sequencing). In addition to conducting large-scale, statistical research into novel genetic causes of developmental disorders,12,13 the DDD Study also returns plausible diagnostic results to individual families via ~200 referring consultant clinical geneticists, who are responsible for their ongoing care.14 The identification and communication of plausible diagnostic variants from the DDD Study was initially designed to be conservative, to maximise positive predictive value while avoiding incorrect diagnosis, with the expectation that the methodology would be largely automated and improved iteratively throughout the study in light of new data and knowledge. An important question is therefore how much of an improvement in diagnostic yield is achievable in a clinically ascertained cohort over time. Here, we reanalyse the data from the first 1133 family trios recruited into the study, describe improvements in the analysis and interpretation workflow, and compare the findings with our initial analysis of this cohort from three years earlier.14

Materials and Methods

Patient recruitment and assays

Children with severe undiagnosed neurodevelopmental disorders, and/or congenital anomalies, abnormal growth parameters, dysmorphic features and unusual behavioural phenotypes, were recruited with their parents from 24 regional genetics services across the UK and Ireland.12,14 Specific clinical data (growth, development, family and pregnancy history, previous investigations, clinical photographs) and Human Phenotype Ontology (HPO) terms15 were recorded by the regional clinical teams for the child and parents via a secure online portal within the DECIPHER database.16 Saliva and/or blood-extracted DNA samples were analysed at the Wellcome Trust Sanger Institute using whole exome sequencing of the family trio (Agilent SureSelect 55MB Exome Plus with Illumina HiSeq) and exon-resolution microarray analysis of the proband (Agilent 2x1M array-CGH [Santa Clara, CA, USA]).12 A selection of candidate variants with low quality metrics were subsequently validated using targeted Sanger sequencing.

Variant detection and annotation

Mapping of short-read sequences was carried out using the Burrows-Wheeler Aligner (BWA; version 0.59)17 algorithm with the GRCh37 1000 Genomes Project phase 2 reference. The Genome Analysis Toolkit (GATK; version 3.1.1)18 and SAMtools (version 0.1.19)19 was used for sample-level BAM improvement and multi-sample variant calling across all samples. Ensembl Variant Effect Predictor (VEP)20 based on Ensembl gene build 76 was used to annotate variants. The population prevalence (minor allele frequency) of each variant was annotated using the Exome Aggregation Consortium (ExAC),21 1000 Genomes Project,22 and internal data from all unaffected (developmentally normal) DDD parents in the cohort. Numerous bespoke algorithms were also developed to detect specific types of additional variation: DeNovoGear23 was used to predict likely de novo single nucleotide variants (SNVs) and small insertions/deletions (indels) in the child, augmented with candidate de novo indels called by GATK and present in the child but not their parents; CNsolidate, CoNVex and CIFER were used respectively to detect copy number variants (CNVs) in the array-CGH and exome data, and to predict their inheritance (unpublished); UPDio24 was used to detect uniparental disomy (UPD); triPOD25 was used to detect structural mosaicism; a chromosome read-depth counter was used to detect chromosomal aneuploidy (unpublished); and Indelible was used to detect soft-clipped reads caused by mid-sized indels (unpublished). All annotated SNVs, indels and CNVs for an individual were combined into a single Variant Call Format file.

Variant filtering

An automated variant filtering pipeline was used to narrow down the number of candidate diagnostic SNVs, indels and CNVs (Figure 1),14 using the following rules for family trios:

Figure 1

Outline of DDD variant filtering and reporting workflow.

Details of thresholds are outlined in the Methods section. The entire workflow is automated until the final stage, which requires detailed clinical review of any candidate variants in light of the child’s specific developmental phenotype.

Allele frequency – variants must be below a series of minor allele frequency (MAF) cut-offs, using the maximum MAF of the internal and external data combined: MAF <0.0005 (0.05%) and ExAC heterozygous allele count <5 in dominant genes; MAF <0.0005 (0.05%) and ExAC hemizygote allele count =0 in hemizygous genes; MAF <0.005 (0.5%) in recessive genes. Predicted consequence – variants must be predicted to have a functional or loss-of-function consequence within a coding gene, based on the transcript with the most severe predicted consequence (longest or canonical selected where there are multiple with the same consequence), including: transcript ablation, transcript amplification, splice donor, splice acceptor, stop gained, frameshift, stop lost, start lost, inframe insertion, inframe deletion, and missense variants. Gene and genotype – to target the analysis towards making a primary diagnosis, variants must overlap a Confirmed or Probable gene in our curated Developmental Disorder Gene-to-Phenotype (DDG2P) database (http://www.ebi.ac.uk/gene2phenotype),14 and the genotypes must match the allelic requirement of the gene. A version of DDG2P from June 2016 was used in this analysis. For SNVs/indels, this includes: single heterozygotes in dominant genes; homozygotes and compound heterozygotes in recessive genes; and X-chromosome hemizygotes in boys in hemizygous genes. For CNVs, this includes: deletions and disruptive intragenic duplications in DDG2P genes with a loss-of-function or dominant negative mechanism; whole gene/exon duplications in genes with an increased gene dosage mechanism; and any large (>1MB) genic deletions/duplications. SNV/CNV compound heterozygotes were also evaluated in biallelic genes. Inheritance – variants in the proband must be inherited in a manner that is both consistent with the family history of disease (assuming full penetrance) and the inheritance pattern of the gene (dominant/recessive/X-linked), including: de novo mutations in dominant and X-linked genes (Sanger validation required if posterior probability from DeNovoGear <0.1); inherited homozygous and compound heterozygous variants in recessive genes; inherited heterozygotes in dominant genes inherited from a developmentally affected parent; maternally inherited X-chromosome variants in boys (which are heterozygous in the mother and hemizygous in her son). Inherited missense variants predicted to be benign by PolyPhen226 were excluded. Candidate variants identified through additional variant detection algorithms (including UPD, aneuploidy, structural mosaics, de novo non-essential splice sites, soft-clipped read indels, and mosaic variants inherited from unaffected parents) were analysed and evaluated outside of this workflow.

Code availability

An updated version of the variant filtering code used by the DDD study is available online at: https://github.com/jeremymcrae/clinical-filter

Variant sharing and genetic diagnosis

Candidate diagnostic variants passing the variant filtering pipeline described above were evaluated by the DDD Study’s internal clinical review team (including two consultant clinical geneticists) and communicated to the regional genetics services via deposition in the DECIPHER database.16 Both the DDD clinical team and the family’s local referring NHS consultant clinical geneticist assessed the diagnostic contribution of the variant(s) to the child’s presenting condition in each individual patient, based on the strength of the genetic evidence (assessment of the variant and inheritance) together with the phenotypic fit with previously reported cases. (UK NHS Consultant clinical geneticists have undertaken a minimum of eight years training post clinical qualification including a minimum of four years specialist training in clinical genetics and rare disease diagnosis.) Likely diagnostic variant(s) were subsequently confirmed in an accredited diagnostic laboratory. Systematic functional studies were not performed, though all reported variants are in published developmental disorder genes with sufficient evidence to merit inclusion in our curated gene-to-phenotype database (https://www.ebi.ac.uk/gene2phenotype/).14 Variant interpretation was informed by guidelines from both the American College of Medical Genetics and Genomics (ACMG)27 and the UK Association for Clinical Genetic Science (ACGS), but with the overall assessment of pathogenicity focussed on an integrated clinical genetic diagnosis including a composite of patient assessment, variant evaluation, inheritance and clinical fit. Clinical teams were asked to record the results of these evaluations in the patient’s variant DECIPHER record, and anonymised variants were made publicly accessible after a short holding period. In addition, plausibly pathogenic variants in genes not yet associated with developmental disorders, detected in children who remain undiagnosed after variant filtering, were anonymized and shared via a research track in DECIPHER, unlinked to the patient record, to facilitate variant match-making.28,29 These included functional de novo variants and rare loss-of-function homozygous, compound heterozygous and hemizygous variants in genes that are neither DDG2P nor OMIM-morbid genes. Full genomic datasets were deposited in the European Genome-Phenome Archive (EGA)30 in accordance with the Regional Ethics Committee approval for the study.

Results

Using the variant detection and filtering workflow described, we have achieved a full or partial diagnosis for 454 probands in the first 1133 family trios in the DDD study, corresponding to a 40% diagnostic yield. Of these, 78% were de novo mutations and 22% were inherited variants (12% recessively inherited from both parents, 4% dominantly inherited from an affected parent, 4% hemizygously inherited from mother to son, and 2% inherited from a mosaic unaffected parent). Thirty-three diagnoses are currently considered by the local clinical team to be a partial explanation for the child’s developmental disorder (i.e. the variant explains some but not all of the child’s phenotypes), while at least six probands have a dual diagnosis resulting in a compound or blended phenotype (i.e. variants in two distinct genes/loci together provide a full diagnosis for the child’s condition).11 An additional 43 probands (4%) have variants of uncertain clinical significance (VUS) in known disease-associated genes, some of which may become diagnostic in future as further evidence accumulates. The diagnostic yield increased by 13% as a result of improvements made to the workflow (Table 1). Overall, 182 additional probands received a new diagnosis, 272 previously diagnosed probands remained diagnosed, and 39 probands had their previous diagnoses clinically reclassified as uncertain or likely benign; a further six probands received a diagnosis from an independent diagnostic test that was missed by the DDD workflow due to low depth sequencing data in at least one member of the trio. Of the new diagnoses, 35% were in 30 new disease-associated genes discovered by the DDD Study itself,12,13,31 34% were in additional published disease genes found through literature searches, 23% resulted from improved analyses (such as updated annotations and variant filtering thresholds) and 8% resulted from additional analytical methods (Table 2).

Table 2

Summary of diagnoses and detection methods in the 454 diagnosed probands.

Reported variants that were considered by our clinical teams to explain all or part of a patient’s phenotype are summarised here; the variants themselves are in available with associated phenotypes through DECIPHER (https://decipher.sanger.ac.uk). All variants are in published developmental disorder genes with sufficient evidence to merit inclusion on our clinician-curated gene-to-phenotype database (https://www.ebi.ac.uk/gene2phenotype/). Note that although most variants have been analytically validated in an accredited diagnostic laboratory, functional studies have not been systematically performed to confirm clinical pathogenicity.

Variant type	Analysis Method	#Diagnoses
Chromosomal aneuploidy	Chromosome read depth counter	2
Copy Number Variants	CNsolidate/CoNVex/CIPHER	50
De novo SNVs/indels in known genes	DeNovoGear	232
De novo SNVs/indels in new DDD genes	DeNovoGear/Discovery	58
De novo SNVs/indels in new external genes	DeNovoGear/DDD Research Variant Track	5
De novo indels in known genes	GATK candidate de novo variant	4
Inherited SNVs/indels in known genes	GATK Mendelian filter	82
Inherited SNVs/indels in new DDD genes	GATK Mendelian filter/Discovery	4
Large insertions/deletions	Soft-clipped reads	4
Mosaic structural variants	triPOD	5
Mosaic inherited SNVs/indels	Parental mosaicism	4
Non-essential splice variants	Splicing analysis	4
Uniparental disomy	UPDio	6
TOTAL*	All	460

Includes 6 dual diagnoses

Discovery indicates that a new developmental gene was found and published by the DDD Study.12,13,31

SNV=single nucleotide polymorphism; Indel=insertion/deletion; GATK=Genome Analysis Toolkit

A total of 838 variants were prioritized by our variant analysis and filtering workflows in this cohort, an average of ~0.7 variants per proband (Figure 2). Following review by two or more consultant clinical geneticists, 460 variants were classified as likely or definitely pathogenic (either fully or partially explaining the patient’s phenotype, Table 2), versus 328 in 2014; a further 378 were classified as uncertain, likely benign or benign for various reasons (lack of relevance of gene to phenotype, minor allele frequency too high, alternative genetic diagnosis in the proband, likely non-coding variant in the relevant transcript, analytical false positive, unrelated parental phenotype or variant absent in affected sibling). The scale of our dataset allows us to estimate the diagnostic yield of different classes of prioritised variants, which varies markedly among different inheritance modes (Figure 3). Over 80% of reported de novo mutations in dominant developmental disease genes, but only 10% of inherited variants in the same group of genes, were classed as likely or definitely pathogenic by our clinical teams. Of the 39 diagnoses that were reported in 2014 and have since been retracted following clinical assessment, 23 no longer meet our criteria for reporting.

Figure 2

Summary of reported and diagnostic variants in 1133 trios.

The total number of candidate variants per proband using the 2017 analysis pipeline is indicated (black bars), along with the number of full or partially diagnostic variants per proband in 2017 (striped dark grey bars) and 2014 (light grey bars).

Figure 3

Pathogenicity assessments of reported variants by inheritance class.

All variants (including SNVs, indels, CNVs, SVs, UPD and aneuploidies) that were classified by clinical teams as definitely/likely pathogenic were considered diagnostic, while those considered uncertain/likely benign/benign were not. The likelihood that a rare, functional de novo mutation in a dominant DDG2P gene is considered pathogenic is >80%, while the diagnostic yield from reported inherited variants is substantially less (10-30%). Note that variants of unknown and mosaic inheritance are excluded from the diagram due to low numbers (n<10).

The DDD Study cohort excludes children who were diagnosed using standard clinical genetic testing within the NHS. Based on previous estimates of the diagnostic yield of clinical microarrays of around 10%,32 plus a small additional diagnostic yield from single gene testing, we estimate that the diagnostic yield of trio whole exome sequencing would be >50% if implemented currently as a first-line test for developmental disorders.

Discussion

We have developed and implemented a scalable, automated and iterative method for reanalysing, re-filtering, re-reporting and re-evaluating candidate diagnostic variants for severe developmental disorders from genome-wide sequence data, which in principle should be readily applicable to a wide range of rare diseases. There are numerous reasons why re-assessing genomic data is necessary, and will continue to bear fruit into the future. Given the extraordinary period of rapid development and discovery in genomics, both analytical methods and variant databases become outdated very quickly. For example, considerably more background population variation data became available between our initial analysis in 2014 and this analysis in 2017 (both internally from unaffected parents within DDD, and externally from resources such as ExAC),21 which is crucial to excluding ‘normal’ benign variation. Furthermore, around 200-300 additional disease-causing genes are published across all rare diseases every year,7 which are vital for finding evidence-based diagnoses within existing sequence data. We have made a large number of evidence-based changes and upgrades to our initial variant analysis and filtering workflow within the DDD Study (Table 1), including: improved and augmented variant calling and QC; updated variant annotation of predicted consequence and allele frequency; improved variant filtering thresholds; and additional disease-associated genes (286 additional genes were added to DDG2P between November 2013 and July 2016). Moreover, in addition to statistically well-powered gene discovery within the DDD study itself, made possible through pooling sequence data from families with developmental disorders from across the UK, we have also catalysed gene discovery by the wider community by sharing plausibly pathogenic variants openly through the DECIPHER database. These changes have yielded substantial benefits. We are now able to diagnose an additional 182 probands in our first 1133 trios, taking our total diagnostic yield from 27% in 2014 to 40% in 2017, highlighting the value of ongoing curation, iterative reanalysis and re-reporting. In addition, by using an expert network of regional consultant clinical geneticists and diagnostic laboratories, we have been able to revise a small number of prior diagnoses through detailed clinical assessment. Although a variety of genetic mechanisms and inheritance patterns contribute to our diagnostic yield, ~80% of our diagnoses are de novo mutations that arose spontaneously during reproduction and are not present in either parent. Moreover, ~80% of reported de novo mutations in a known dominant developmental disorder were classed as pathogenic by our clinical teams, emphasising the utility of trio sequencing as a first-line strategy in sporadic cases. Many challenges remain for continuing to improve the sensitivity and specificity of genomic sequencing. First, achieving the right balance between identifying diagnostic variants and over-reporting is problematic; the many detailed decisions required are obscured by automated workflows and hard-wired filtering thresholds. A rules-based approach will always result in reporting some false positive variants and missing some true positives. Clinical teams are usually quite unaware of which parts of the genome they are not seeing, or why, making unbiased evaluation of candidate variants extremely difficult. Moreover, variant filtering is substantially less effective for some patients and families. For family trios where both parents are unaffected and there is no family history, the majority of potentially diagnostic variants reported from exome sequencing are novel de novo mutations and are very likely to be causal; however, the converse is also true, and where both parents share a similar phenotype, the majority of reported variants are inherited and are unlikely to be causal (Figure 3). The situation is even more challenging for non-trios where the parents are unavailable for testing.14 Ever larger datasets of normal, benign variants will improve this situation, as will improved tools for predicting the pathogenicity of missense variants, but given that every family has rare/private variants, individuals and families with rare inherited dominant conditions may be better served by using more tightly focused analyses that are specific to their condition. Second, diseases vary substantially in their genomic footprint, and those that are highly genetically heterogeneous will always be difficult to diagnose. The more genes that are causally associated with similar or overlapping phenotypes, the harder it is to be certain that any given variant is actually the cause. Although our top diagnostic genes (ARID1B, SATB2, SCN2A, ANKRD11, MED13L and SYNGAP1) together accounted for 55 diagnoses (5% of the cohort), the substantial locus heterogeneity of developmental disorders means that most genes only contribute a single diagnosis in this cohort (Supplementary Figure 1), and we have yet to find a diagnosis in the majority of the 1400 genes on our diagnostic gene list. Although more disease-associated genes will be discovered, it is likely that these will be increasingly rare in prevalence. Substantial allelic heterogeneity also makes variant interpretation challenging even in known disease-causing genes. Third, managing the expectations of clinicians and families is extremely challenging in such a fast-moving field, as is achieving clarity about the nature and scope of the obligations of researchers and health professionals. Diagnoses can appear at almost any time, even following a ‘negative report’, or can be retracted as new evidence comes to light, or augmented by additional variants that may – or may not – contribute to the phenotype. Dual diagnoses resulting in blended phenotypes, which may be overlapping or distinct, are particularly challenging to untangle, as are ‘coincidental’ findings in phenotypically heterogeneous genes where variants can cause both the disorder in question and another unrelated disorder. Although determining whether a particular variant or combination of variants explains the child’s phenotype – or part of it, or none of it – is sometimes simple, other times it is not and may require further clinical evaluation and investigation. This uncertainty is the nature of a field where research and clinical practice are so entwined. By requiring peer-reviewed publication of disease-associated genes prior to addition to our diagnostic gene list and diagnostic reporting of causal variants, the DDD study has maintained a clear demarcation between research analyses and clinical practice to reduce some of this uncertainty. Through the DECIPHER platform, we also provided clinical teams with the systems and information necessary to help evaluate candidate variants. However, decisions about when and how to contact (or recontact) individual families with potential diagnoses are ultimately for local clinical teams to judge, based on their greater knowledge of the family. Finally, a question remains as to how we should best counsel the 673 families who still have no diagnoses after several rounds of reanalysing their data. How many more diagnoses can we expect from this same cohort in another 3 years, or another 10, and what might be reasonable for a family to expect in terms of follow-up? Large-scale sequencing studies allow us to estimate what proportion of currently undiagnosed patients are likely to be explained by a given class of variation, such as dominant de novo mutations.13 However, amongst any cohort, there is likely to be a grey area between definitively genetic conditions, where a single genetic variant is the sole cause of disease, and those where multiple variants and environmental factors play a role. We don’t yet know what proportion of the DDD cohort have a monogenic cause for their condition, and what fraction may have an oligogenic or polygenic component. Nonetheless, in our initial 1133 trios, we were unable to find any statistically significant phenotypic differences between the diagnosed and undiagnosed groups (Supplementary Figures 2 and 3). Currently, two-thirds of our novel diagnoses resulted from additional new disease-associated genes over the last 3 years, and it is therefore likely that the number of diagnoses will continue to increase as more causal genes are discovered through collaboration, data sharing and meta-analyses. Although this growth in disease-associated genes is likely to slow at some point in the near future, at least for dominant diseases for which trio whole exome study designs are very powerful, it is likely that very rare and recessive diseases will continue to be discovered for many years to come. Some diagnoses will also be missing from our data, due to low coverage in particular coding regions, long repeats or structural variants not detectable with short-read sequencing, or non-coding variants not assayed by exome sequencing. Although this suggests that whole genome sequencing should increase our diagnostic yield further, the additional yield from genome sequencing is unlikely to be substantial given that we know of just six ‘missed’ diagnoses in our cohort. The emphasis for future reanalysis and diagnostic reporting ought therefore to focus on better curation of gene-disease relationships and the continued coupling of research and clinical practice to enable robust gene discovery. This work has significant implications for diagnostic laboratory reports. We suggest that iterative reinterpretation of already reported clinical sequencing data should become routine. This would require a major cultural change in reporting that would have implications for the development of appropriate informatics systems, the prioritisation of clinical expertise, and the emotional burden on affected individuals and their families, all of whom may have to deal with the uncertainty of diagnoses emerging subsequently even following an initial negative report. Further work is needed to investigate the logistical and communication challenges, resource implications, and informatics infrastructure required to implement systematic reinterpretation and recontact in clinical practice.

96 in total

1. Clinical Utility of Reinterpreting Previously Reported Genomic Epilepsy Test Results for Pediatric Patients.

Authors: Jeffrey A SoRelle; Drew M Thodeson; Susan Arnold; Garrett Gotway; Jason Y Park
Journal: JAMA Pediatr Date: 2019-01-07 Impact factor: 16.193

2. De Novo Variants in TAOK1 Cause Neurodevelopmental Disorders.

Authors: Marija Dulovic-Mahlow; Joanne Trinh; Krishna Kumar Kandaswamy; Geir Julius Braathen; Nataliya Di Donato; Elisa Rahikkala; Skadi Beblo; Martin Werber; Victor Krajka; Øyvind L Busk; Hauke Baumann; Nouriya Abbas Al-Sannaa; Frauke Hinrichs; Rabea Affan; Nir Navot; Mohammed A Al Balwi; Gabriela Oprea; Øystein L Holla; Maximilian E R Weiss; Rami A Jamra; Anne-Karin Kahlert; Shivendra Kishore; Kristian Tveten; Melissa Vos; Arndt Rolfs; Katja Lohmann
Journal: Am J Hum Genet Date: 2019-06-20 Impact factor: 11.025

3. Defective tubulin detyrosination causes structural brain abnormalities with cognitive deficiency in humans and mice.

Authors: Alistair T Pagnamenta; Pierre Heemeryck; Hilary C Martin; Christophe Bosc; Leticia Peris; Ivy Uszynski; Sylvie Gory-Fauré; Simon Couly; Charu Deshpande; Ata Siddiqui; Alaa A Elmonairy; Sandeep Jayawant; Sarada Murthy; Ian Walker; Lucy Loong; Peter Bauer; Frédérique Vossier; Eric Denarier; Tangui Maurice; Emmanuel L Barbier; Jean-Christophe Deloulme; Jenny C Taylor; Edward M Blair; Annie Andrieux; Marie-Jo Moutin
Journal: Hum Mol Genet Date: 2019-10-15 Impact factor: 6.150

4. A head-to-head evaluation of the diagnostic efficacy and costs of trio versus singleton exome sequencing analysis.

Authors: Tiong Yang Tan; Sebastian Lunke; Belinda Chong; Dean Phelan; Miriam Fanjul-Fernandez; Justine E Marum; Vanessa Siva Kumar; Zornitza Stark; Alison Yeung; Natasha J Brown; Chloe Stutterd; Martin B Delatycki; Simon Sadedin; Melissa Martyn; Ilias Goranitis; Natalie Thorne; Clara L Gaff; Susan M White
Journal: Eur J Hum Genet Date: 2019-07-18 Impact factor: 4.246

5. Unique bioinformatic approach and comprehensive reanalysis improve diagnostic yield of clinical exomes.

Authors: Klaus Schmitz-Abe; Qifei Li; Samantha M Rosen; Neeharika Nori; Jill A Madden; Casie A Genetti; Monica H Wojcik; Sadhana Ponnaluri; Cynthia S Gubbels; Jonathan D Picker; Anne H O'Donnell-Luria; Timothy W Yu; Olaf Bodamer; Catherine A Brownstein; Alan H Beggs; Pankaj B Agrawal
Journal: Eur J Hum Genet Date: 2019-04-12 Impact factor: 4.246

6. Children's rare disease cohorts: an integrative research and clinical genomics initiative.

Authors: Shira Rockowitz; Nicholas LeCompte; Mary Carmack; Andrew Quitadamo; Lily Wang; Meredith Park; Devon Knight; Emma Sexton; Lacey Smith; Beth Sheidley; Michael Field; Ingrid A Holm; Catherine A Brownstein; Pankaj B Agrawal; Susan Kornetsky; Annapurna Poduri; Scott B Snapper; Alan H Beggs; Timothy W Yu; David A Williams; Piotr Sliz
Journal: NPJ Genom Med Date: 2020-07-06 Impact factor: 8.617

7. Analysis of VUS reporting, variant reinterpretation and recontact policies in clinical genomic sequencing consent forms.

Authors: Danya F Vears; Emilia Niemiec; Heidi Carmen Howard; Pascal Borry
Journal: Eur J Hum Genet Date: 2018-08-24 Impact factor: 4.246