| Literature DB >> 34075210 |
Leslie Matalonga1, Carles Hernández-Ferrer1, Davide Piscia1, Rebecca Schüle2, Matthis Synofzik2,3, Ana Töpf4, Lisenka E L M Vissers5,6, Richarda de Voer5,7, Raul Tonda1, Steven Laurie1, Marcos Fernandez-Callejo1, Daniel Picó1, Carles Garcia-Linares1, Anastasios Papakonstantinou1, Alberto Corvó1, Ricky Joshi1, Hector Diez1, Ivo Gut1, Alexander Hoischen5,7,8, Holm Graessner9,10, Sergi Beltran11,12,13.
Abstract
Reanalysis of inconclusive exome/genome sequencing data increases the diagnosis yield of patients with rare diseases. However, the cost and efforts required for reanalysis prevent its routine implementation in research and clinical environments. The Solve-RD project aims to reveal the molecular causes underlying undiagnosed rare diseases. One of the goals is to implement innovative approaches to reanalyse the exomes and genomes from thousands of well-studied undiagnosed cases. The raw genomic data is submitted to Solve-RD through the RD-Connect Genome-Phenome Analysis Platform (GPAP) together with standardised phenotypic and pedigree data. We have developed a programmatic workflow to reanalyse genome-phenome data. It uses the RD-Connect GPAP's Application Programming Interface (API) and relies on the big-data technologies upon which the system is built. We have applied the workflow to prioritise rare known pathogenic variants from 4411 undiagnosed cases. The queries returned an average of 1.45 variants per case, which first were evaluated in bulk by a panel of disease experts and afterwards specifically by the submitter of each case. A total of 120 index cases (21.2% of prioritised cases, 2.7% of all exome/genome-negative samples) have already been solved, with others being under investigation. The implementation of solutions as the one described here provide the technical framework to enable periodic case-level data re-evaluation in clinical settings, as recommended by the American College of Medical Genetics.Entities:
Mesh:
Year: 2021 PMID: 34075210 PMCID: PMC8440686 DOI: 10.1038/s41431-021-00852-7
Source DB: PubMed Journal: Eur J Hum Genet ISSN: 1018-4813 Impact factor: 4.246
Fig. 1Programmatic reanalysis data workflow.
Unsolved cases (RD-REAL datasets = phenotypic and genomic data) are submitted by Solve-RD members from the 4 core ERNs and the 2 UDPs participating in the project. Genomic data is processed through a standard analysis pipeline [15] and integrated with the phenotypic information in the RD-Connect GPAP. Analysis of the data using the programmatic approach described in this study is performed by the SNV-indel working group. The SNV-indel working group is one of the seven working groups established by the Solve-RD Data Analysis Task Force (DATF) to massively reanalyse data with different analytical approaches (e.g. CNV, somatic, meta-analysis, etc.) (http://solve-rd.eu/the-group/data-analysis-task-force/). The DATF involves data scientists and genomics experts from the project. Resulting candidate variants are submitted to the Data Interpretation Task Force (DITF), involving expert clinicians and geneticists for prioritisation and final interpretation. One DITF has been established for each of the core ERNs participating in the project (http://solve-rd.eu/the-group/data-interpretation-task-force-ditf/). DITF include or are in contact with case submitters to enable a final decision for a new patient diagnosis. Diagnosed cases are automatically updated in the system and the remaining unsolved cases are susceptible to re-enter a new round of analysis.
Number of cases, family structures and identified variants by European Reference Networks participating in the study.
| Type of disorder | Number of families /index cases | Trio | Singleton | Other family structure | Number of genes in the corresponding gene list | Number variants identified | Number of cases with identified variants | Number of variants prioritised | Number of cases with prioritised variants | Number of solved cases | Number of cases under evaluation | Number of cases with an heterozygous variant for an AR disorder identified | Number of unsolved cases |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Intellectual disability | 1472 | 1008 (68.4%) | 436 (29.6%) | 28 (2%) | 1740 | 1618 | 980 | 193 | 158 | 62 | 5 | 15 | 76 |
| Neuromuscular disorders | 616 | 124 (20.1%) | 433 (70.2%) | 59 (9.5%) | 594 | 278 | 223 | 278 | 228 | 22 | 13 | 21 | 172 |
| Neurological disorders | 2048 | 130 (6.3%) | 1847 (90.1%) | 71 (3.4%) | 358 | 667 | 552 | 177 | 150 | 38 | 2 | 48 | 62 |
| Tumor risk syndromes | 275 | 0 | 273 (99.3%) | 2 (0.7%) | 229 | 30 | 30 | 30 | 30 | 3 | 0 | 3 | 24 |
| TOTAL | 4411 | 1262 (28%) | 2989 (68%) | 160 (4%) | NA | 2593 | 1785 | 678 | 566 | 120 | 25 | 87 | 334 |
Fig. 2Results of reanalysis of undiagnosed RD cases to identify known disease-causing variants.
A Filtration, prioritisation and interpretation workflow (numbers refer to index cases). B Number of variants per case submitted to DITFs for prioritisation and resulting number of variants submitted for interpretation. C Variants interpretation results from prioritised cases per type of disorder (numbers refer to variants). D Number of causative variants identified according to the year the corresponding gene (grey) or variant (yellow) was first described in the literature as disease-causing (according to OMIM) or pathogenic (according to ClinVar).