Literature DB >> 34075208

Solve-RD: systematic pan-European data sharing and collaborative analysis to solve rare diseases.

Birte Zurek¹, Kornelia Ellwanger¹, Lisenka E L M Vissers^2,3, Rebecca Schüle^4,5, Matthis Synofzik^4,5, Ana Töpf⁶, Richarda M de Voer^2,7, Steven Laurie⁸, Leslie Matalonga⁸, Christian Gilissen^2,7, Stephan Ossowski¹, Peter A C 't Hoen^7,9, Antonio Vitobello¹⁰, Julia M Schulze-Hentrich¹, Olaf Riess^1,11, Han G Brunner^2,3,12, Anthony J Brookes¹³, Ana Rath¹⁴, Gisèle Bonne¹⁵, Gulcin Gumus¹⁶, Alain Verloes¹⁷, Nicoline Hoogerbrugge^2,7, Teresinha Evangelista¹⁵, Tina Harmuth¹, Morris Swertz¹⁸, Dylan Spalding¹⁹, Alexander Hoischen^2,7,20, Sergi Beltran^8,21,22, Holm Graessner^23,24.

Abstract

For the first time in Europe hundreds of rare disease (RD) experts team up to actively share and jointly analyse existing patient's data. Solve-RD is a Horizon 2020-supported EU flagship project bringing together >300 clinicians, scientists, and patient representatives of 51 sites from 15 countries. Solve-RD is built upon a core group of four European Reference Networks (ERNs; ERN-ITHACA, ERN-RND, ERN-Euro NMD, ERN-GENTURIS) which annually see more than 270,000 RD patients with respective pathologies. The main ambition is to solve unsolved rare diseases for which a molecular cause is not yet known. This is achieved through an innovative clinical research environment that introduces novel ways to organise expertise and data. Two major approaches are being pursued (i) massive data re-analysis of >19,000 unsolved rare disease patients and (ii) novel combined -omics approaches. The minimum requirement to be eligible for the analysis activities is an inconclusive exome that can be shared with controlled access. The first preliminary data re-analysis has already diagnosed 255 cases form 8393 exomes/genome datasets. This unprecedented degree of collaboration focused on sharing of data and expertise shall identify many new disease genes and enable diagnosis of many so far undiagnosed patients from all over Europe.

Entities: Chemical

Mesh：

Year: 2021 PMID： 34075208 PMCID： PMC8440542 DOI： 10.1038/s41431-021-00859-0

Source DB: PubMed Journal: Eur J Hum Genet ISSN： 1018-4813 Impact factor: 4.246

Rare Diseases (RD) are individually rare but collectively a common health issue. Around 80% of RD are estimated to have a genetic cause [1]. The time to a genetic diagnosis however often takes several years and initial clinical diagnoses are incorrect in up to 40% of families [2]. Around 50% of patients with a RD remain undiagnosed even in advanced expert clinical settings where whole exome sequencing (WES) is applied routinely as a diagnostic approach. Depending on the exact diagnostic setting, the inclusion criteria and the type of RD, the diagnostic yield from WES ranges between 15 and 51% of cases [3, 4]. At least two scenarios allow boosting the current yield of WES. Firstly, there is a value in re-analysing WES data regularly [5] and on massive scale [6], but not every RD expert has access to tools enabling this systematically. Secondly, it is clear that moving beyond the exome can provide additional benefits [7, 8]. Solve-RD aims to solve a large number of unsolved RD, for which a molecular cause is not yet known, by implementing both strategies mentioned above. To this end, Solve-RD applies innovative ways to effectively organise expertise and data.

Cohorts

To structure its work Solve-RD has defined four types of cohorts. Cohort 1, “Unsolved Cases”, comprises cases with an inconclusive WES or whole genome sequencing (WGS) from any partnering or associated ERN center. These data undergo a comprehensive re-analysis effort. Cohort 2, “Specific ERN Cohorts”, represent disease group specific ERN cohorts that are analysed by newly applied tailored -omics approaches. Cohort 3, “Ultra-Rare Rare Diseases”, includes (groups of) patients with unique phenotypes identified (and matched) by RD experts from all ERN participants. For the diseases included in Cohort 4, “The Unsolvables”, all relevant -omics methodologies will be used to solve highly recognisable, clinically well-defined disease entities for which the disease cause has not been found yet despite considerable previous research investigations including WES and WGS (Table 1).

Table 1

Examples for the specific ERN cohorts and the unsolvables.

Cohort	Rationale
Cohort 2: Long-read whole genome sequencing (LR-WGS)
X-linked spinal and bulbar muscular atrophy (SBMA)	Suspected expansions of repeat disorder or other hidden structural variants (SV)
Hereditary ataxia	Suspected expansions of repeat disorder or other hidden SVs
Cohort 2: Genomics and Epigenomics
Unexplained Intellectual Disability (ID): patient-parent trios	De novo mutation prioritisation very powerful filter for de novo methylation changes
Diffuse gastric cancer	Hypermethylation of cancer gene promoter known disease mechanism
Rare pheochromocytomas and paragangliomas	Hypermethylation of cancer gene promoter known disease mechanism
Cohort 4
Unsolved syndromes available via ERN ITHACA	Aicardi syndrome, Gomez–Lopez Hernandes syndrome, Hallermann–Streiff syndrome are clinically well-defined entities and have been studied by WES and WGS globally and remain unsolved

Examples for the specific ERN cohorts and the unsolvables. In total, Solve-RD is targeting to re-analyse >19,000 datasets for cohort 1, sequence ~3500 short- and long-read WGS for cohorts 2, 3, and 4 and add >3500 additional -omics experiments including RNA sequencing, epigenomics, metabolomics, Deep-WES, and deep molecular phenotyping. Data collected and produced in Solve-RD shall be shared via the European Genome-Phenome Archive (EGA) and the RD-Connect Genome-Phenome Analysis Platform (GPAP) to allow controlled access by other RD initiatives and scientists.

Organisation of data

The Solve-RD strategy relies on the availability of large amounts of good quality, standardised genomic and phenotypic data and metadata from undiagnosed RD patients and their relatives. Solve-RD follows a centralised approach, to enable all envisioned analyses. Data sharing in Solve-RD is regulated by policy documents, available on the project’s website. To overcome the technical challenge of centralising large amounts of data, Solve-RD leverages existing infrastructures such as EGA, GPAP, and computing clusters from project partners (Fig. 1). In addition, Solve-RD is developing a cloud-based computing cluster for collaborative analysis and methods testing (the Solve-RD Sandbox) and a central database to control and view all the project’s data and metadata (RD3; rare disease data about data) using the MOLGENIS open source data platform [9]. Clinical data and pedigree structure for all participating individuals is collated through standard terms and ontologies such as HPO, ORDO, and OMIM using GPAP-PhenoStore. To share data within the project and beyond, Solve-RD is an early adopter of the recently GA4GH-approved (Global Alliance for Genomics and Health, https://www.ga4gh.org) PhenoPackets standard to enable exchange of phenotypic and family information [10].

Fig. 1

Solve-RD data infrastructure.

Key components of the Solve-RD infrastructure for multi-omics data analysis, illustrating main use and data available.

Solve-RD data infrastructure.

Key components of the Solve-RD infrastructure for multi-omics data analysis, illustrating main use and data available. For each individual, WES and/or WGS data are submitted to GPAP in FASTQ, BAM, or CRAM format. The sequencing data are processed through a standard pipeline based on GATK (Genomic Analysis Toolkit variant calling software) best practices [11, 12]. After that, PhenoPackets, PED files (for pedigrees), raw data (FASTQ), alignments (BAM) and genetic variants (gVCF) are transferred to the EGA, where they are archived and made available to the consortium (and later on to the broader RD community) for further analysis. Furthermore, Solve-RD data are connected to MatchMaker Exchange via GPAP. To reach the ambitious goal to collect 19,000 unsolved WES/WGS, Solve-RD has defined several deadlines to submit data to the project. After each deadline, all data are processed and released as a data freeze, which is amenable to corrections via patches. The first data freeze, released in early 2020, includes data from 8,393 individuals. In parallel to the collection of existing data for cohort 1, new omics data are being generated for cohorts 2, 3, and 4. A common data workflow has been established for all these data types (Fig. 1). The data collated and generated by Solve-RD constitutes a unique collection that will be valuable beyond the project, and the consortium is committed to make it FAIR under controlled access, through the EGA and GPAP.

Organisation of expertise

Solve-RD works on the interphase of many disciplines relevant to solving the unsolved RD. Central to the RD field are clinical geneticists and clinical scientists organised in the respective ERNs. Solve-RD provides expertise in genomics and other -omics data analysis, through data scientists, molecular geneticists, and bioinformaticians. To warrant the best exchange of expertise we have implemented two structures: (i) Data scientists and genomics experts are organised in a Data Analysis Task Force (DATF), (ii) Expert clinicians and geneticists from each ERN are organised in a Data Interpretation Task Force (DITF) (Fig. 2). The tasks for these structures are in brief: ►DITF: define needs of ERN for (a) data re-analysis and (b) novel -omics data; define use cases for re-analysis and novel analysis; discuss/test suitable data output formats for clinical scientists; coordinate collaborative data interpretation; discuss within respective ERN network and feedback to DATF. ►DATF: map expertise in Solve-RD and all (ERN-)partners; create Analysis Projects (Supplementary Table S1) based on ERNs needs; develop state-of-the-art analysis tools; analyse data: (a) data re-analysis and (b) novel omics data; optimise data sharing and output formats for DITF/ERNs.

Fig. 2

The Solve-RD data analysis structure ‘in action’.

The Solve-RD data analysis structure ‘in action’.

Consisting of the Data Analysis Task Force (DATF) and four Data Interpretation Task Forces (DITF)—one per core ERN involved. The DATF established working groups (WGs) for specific analyses. Working groups and DITFs jointly work on analysis projects based on use cases described by the DITF members. The structure implemented for data re-analysis has proven efficient and versatile [13], and will therefore be applied for novel omics data analysis, with additional working groups for specific -omics technologies (Fig. 3).

Fig. 3

Organisation of new result flow in Solve-RD.

Working groups (WG) 1–5 will re-analyse existing sequencing data. Novel omics data will be analysed by all working groups (as appropriate). RD-REAL refers to Rare Disease - REAnalysis Logistics.

Organisation of new result flow in Solve-RD.

Working groups (WG) 1–5 will re-analyse existing sequencing data. Novel omics data will be analysed by all working groups (as appropriate). RD-REAL refers to Rare Disease - REAnalysis Logistics. To integrate expertise not available within the Solve-RD consortium, particularly with regards to molecular and functional validation of newly found genes, Solve-RD is implementing an innovative brokerage system (Rare Disease Models and Mechanisms Network—Europe (RDMM-Europe)) that has already been successfully used in Canada [14]. As of 4 December 2020, 14 “brokering” Seeding Grants have been awarded to external model investigators.

Achievements and challenges

The work of the first 3 years of Solve-RD resulted in a practical solution to share and jointly analyse 8393 datasets from all over Europe: Solve-RD organised RD expertise via a DITF and DATF with the respective working group structure described above. The first re-analysis approaches resulted in 255 newly diagnosed cases, mainly by leveraging latest ClinVar entries. As examples we refer to adjacent articles, published jointly in this issue [13, 15–18]. Many more candidate variants and new analysis results are under evaluation. To achieve its current status Solve-RD has successfully addressed some critical challenges that are (a) European data sharing in accordance with GDPR, (b) heterogeneity in existing WES data (e.g. 26 WES kits so far; multiple sequencing platforms), (c) implementing a centralised analysis approach and (d) addressing the rarity of events. It is the vision of Solve-RD that, by the end of the project, the Solve-RD dataset will be the largest well-annotated, standardised, multi-omics RD dataset on the diseases covered by the four core ERNs. In this sense, we hope that the Solve-RD dataset will be as useful to the RD community as the gnomAD consortium is for the genomics community [19], by making -omics data of RD populations available to the community.

15 in total

1. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors: Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal: Genome Res Date: 2010-07-19 Impact factor: 9.043

2. The Canadian Rare Diseases Models and Mechanisms (RDMM) Network: Connecting Understudied Genes to Model Organisms.

Authors: Kym M Boycott; Philippe M Campeau; Heather E Howley; Paul Pavlidis; Sanja Rogic; Christine Oriel; Jason N Berman; Robert M Hamilton; Geoffrey G Hicks; Howard D Lipshitz; Jean-Yves Masson; Eric A Shoubridge; Anne Junker; Michel R Leroux; Christopher R McMaster; Jaques L Michaud; Stuart E Turvey; David Dyment; A Micheil Innes; Clara D van Karnebeek; Anna Lehman; Ronald D Cohn; Ian M MacDonald; Richard A Rachubinski; Patrick Frosk; Anthony Vandersteen; Richard W Wozniak; Izabella A Pena; Xiao-Yan Wen; Thierry Lacaze-Masmonteil; Catharine Rankin; Philip Hieter
Journal: Am J Hum Genet Date: 2020-02-06 Impact factor: 11.025

Review 3. Genomic medicine for undiagnosed diseases.

Authors: Anastasia L Wise; Teri A Manolio; George A Mensah; Josh F Peterson; Dan M Roden; Cecelia Tamburro; Marc S Williams; Eric D Green
Journal: Lancet Date: 2019-08-05 Impact factor: 79.321

4. Reanalysis of Clinical Exome Sequencing Data.

Authors: Pengfei Liu; Linyan Meng; Elizabeth A Normand; Fan Xia; Xiaofei Song; Andrew Ghazi; Jill Rosenfeld; Pilar L Magoulas; Alicia Braxton; Patricia Ward; Hongzheng Dai; Bo Yuan; Weimin Bi; Rui Xiao; Xia Wang; Theodore Chiang; Francesco Vetrini; Weimin He; Hanyin Cheng; Jie Dong; Charul Gijavanekar; Paul J Benke; Jonathan A Bernstein; Tanya Eble; Yasemen Eroglu; Deanna Erwin; Luis Escobar; James B Gibson; Karen Gripp; Soledad Kleppe; Mary K Koenig; Andrea M Lewis; Marvin Natowicz; Pedro Mancias; LaKeesha Minor; Fernando Scaglia; Christian P Schaaf; Haley Streff; Hilary Vernon; Crescenda L Uhles; Elaine H Zackai; Nan Wu; V Reid Sutton; Arthur L Beaudet; Donna Muzny; Richard A Gibbs; Jennifer E Posey; Seema Lalani; Chad Shaw; Christine M Eng; James R Lupski; Yaping Yang
Journal: N Engl J Med Date: 2019-06-20 Impact factor: 91.245

Review 5. New Diagnostic Approaches for Undiagnosed Rare Genetic Diseases.

Authors: Taila Hartley; Gabrielle Lemire; Kristin D Kernohan; Heather E Howley; David R Adams; Kym M Boycott
Journal: Annu Rev Genomics Hum Genet Date: 2020-04-13 Impact factor: 8.929

6. From Wet-Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing.

Authors: Steve Laurie; Marcos Fernandez-Callejo; Santiago Marco-Sola; Jean-Remi Trotta; Jordi Camps; Alejandro Chacón; Antonio Espinosa; Marta Gut; Ivo Gut; Simon Heath; Sergi Beltran
Journal: Hum Mutat Date: 2016-09-26 Impact factor: 4.878

Review 7. Clinical Application of Genome and Exome Sequencing as a Diagnostic Tool for Pediatric Patients: a Scoping Review of the Literature.

Authors: Hadley Stevens Smith; J Michael Swint; Seema R Lalani; Jose-Miguel Yamal; Marcia C de Oliveira Otto; Stephan Castellanos; Amy Taylor; Brendan H Lee; Heidi V Russell
Journal: Genet Med Date: 2018-05-14 Impact factor: 8.822

8. MOLGENIS research: advanced bioinformatics data software for non-bioinformaticians.

Authors: K Joeri van der Velde; Floris Imhann; Bart Charbon; Chao Pang; David van Enckevort; Mariska Slofstra; Ruggero Barbieri; Rudi Alberts; Dennis Hendriksen; Fleur Kelpin; Mark de Haan; Tommy de Boer; Sido Haakma; Connor Stroomberg; Salome Scholtens; Gert-Jan van de Geijn; Eleonora A M Festen; Rinse K Weersma; Morris A Swertz
Journal: Bioinformatics Date: 2019-03-15 Impact factor: 6.937

9. A MT-TL1 variant identified by whole exome sequencing in an individual with intellectual disability, epilepsy, and spastic tetraparesis.

Authors: Alain Verloes; Lisenka E L M Vissers; Elke de Boer; Charlotte W Ockeloen; Leslie Matalonga; Rita Horvath; Richard J Rodenburg; Marieke J H Coenen; Mirian Janssen; Dylan Henssen; Christian Gilissen; Wouter Steyaert; Ida Paramonov; Aurélien Trimouille; Tjitske Kleefstra
Journal: Eur J Hum Genet Date: 2021-06-01 Impact factor: 4.246

10. Exome reanalysis and proteomic profiling identified TRIP4 as a novel cause of cerebellar hypoplasia and spinal muscular atrophy (PCH1).

Authors: Andreas Roos; Rita Horvath; Ana Töpf; Angela Pyle; Helen Griffin; Leslie Matalonga; Katherine Schon; Albert Sickmann; Ulrike Schara-Schmidt; Andreas Hentschel; Patrick F Chinnery; Heike Kölbel
Journal: Eur J Hum Genet Date: 2021-06-01 Impact factor: 4.246

10 in total

Review 1. Intellectual disability genomics: current state, pitfalls and future challenges.

Authors: Nuno Maia; Maria João Nabais Sá; Manuel Melo-Pires; Arjan P M de Brouwer; Paula Jorge
Journal: BMC Genomics Date: 2021-12-20 Impact factor: 3.969

2. Network analysis reveals rare disease signatures across multiple levels of biological organization.

Authors: Pisanu Buphamalai; Tomislav Kokotovic; Vanja Nagy; Jörg Menche
Journal: Nat Commun Date: 2021-11-09 Impact factor: 14.919

3. Novel gene discovery for hearing loss and other routes to increased diagnostic rates.

Authors: Hannie Kremer
Journal: Hum Genet Date: 2021-10-01 Impact factor: 5.881

4. FAIR Genomes metadata schema promoting Next Generation Sequencing data reuse in Dutch healthcare and research.

Authors: Jeroen A M Beliën; Mariëlle E van Gijn; Morris A Swertz; K Joeri van der Velde; Gurnoor Singh; Rajaram Kaliyaperumal; XiaoFeng Liao; Sander de Ridder; Susanne Rebers; Hindrik H D Kerstens; Fernanda de Andrade; Jeroen van Reeuwijk; Fini E De Gruyter; Saskia Hiltemann; Maarten Ligtvoet; Marjan M Weiss; Hanneke W M van Deutekom; Anne M L Jansen; Andrew P Stubbs; Lisenka E L M Vissers; Jeroen F J Laros; Esther van Enckevort; Daphne Stemkens; Peter A C 't Hoen
Journal: Sci Data Date: 2022-04-13 Impact factor: 8.501

5. Beacon v2 and Beacon networks: A "lingua franca" for federated data discovery in biomedical genomics, and beyond.

Authors: Jordi Rambla; Michael Baudis; Roberto Ariosa; Tim Beck; Lauren A Fromont; Arcadi Navarro; Rahel Paloots; Manuel Rueda; Gary Saunders; Babita Singh; John D Spalding; Juha Törnroos; Claudia Vasallo; Colin D Veal; Anthony J Brookes
Journal: Hum Mutat Date: 2022-04-08 Impact factor: 4.700

6. Evaluating the national system for rare diseases in China from the point of drug access: progress and challenges.

Authors: Luyao Qiao; Xin Liu; Junmei Shang; Wei Zuo; Tingting Xu; Jinghan Qu; Jiandong Jiang; Bo Zhang; Shuyang Zhang
Journal: Orphanet J Rare Dis Date: 2022-09-10 Impact factor: 4.303

7. The RD-Connect Genome-Phenome Analysis Platform: Accelerating diagnosis, research, and gene discovery for rare diseases.

Authors: Steven Laurie; Davide Piscia; Leslie Matalonga; Alberto Corvó; Marcos Fernández-Callejo; Carles Garcia-Linares; Carles Hernandez-Ferrer; Cristina Luengo; Inés Martínez; Anastasios Papakonstantinou; Daniel Picó-Amador; Joan Protasio; Rachel Thompson; Raul Tonda; Mònica Bayés; Gemma Bullich; Jordi Camps-Puchadas; Ida Paramonov; Jean-Rémi Trotta; Angel Alonso; Marcella Attimonelli; Christophe Béroud; Virginie Bros-Facer; Orion J Buske; Andrés Cañada-Pallarés; José M Fernández; Mats G Hansson; Rita Horvath; Julius O B Jacobsen; Rajaram Kaliyaperumal; Séverine Lair-Préterre; Luana Licata; Pedro Lopes; Estrella López-Martín; Deborah Mascalzoni; Lucia Monaco; Luis A Pérez-Jurado; Manuel Posada de la Paz; Jordi Rambla; Ana Rath; Olaf Riess; Peter N Robinson; David Salgado; Damian Smedley; Dylan Spalding; Peter A C 't Hoen; Ana Töpf; Irina Zaharieva; Holm Graessner; Ivo G Gut; Hanns Lochmüller; Sergi Beltran
Journal: Hum Mutat Date: 2022-06 Impact factor: 4.700

Review 8. Clinical exome sequencing-Mistakes and caveats.

Authors: Jordi Corominas; Sanne P Smeekens; Marcel R Nelen; Helger G Yntema; Erik-Jan Kamsteeg; Rolph Pfundt; Christian Gilissen
Journal: Hum Mutat Date: 2022-03-15 Impact factor: 4.700

9. A MT-TL1 variant identified by whole exome sequencing in an individual with intellectual disability, epilepsy, and spastic tetraparesis.

10. Solving the unsolved rare diseases in Europe.

Authors: Holm Graessner; Birte Zurek; Alexander Hoischen; Sergi Beltran
Journal: Eur J Hum Genet Date: 2021-09 Impact factor: 4.246

10 in total