| Literature DB >> 35292119 |
Rajaram Kaliyaperumal1, Mark D Wilkinson2, Pablo Alarcón Moreno3, Nirupama Benis4, Ronald Cornet4, Bruna Dos Santos Vieira5,6, Michel Dumontier7, César Henrique Bernabé1, Annika Jacobsen1, Clémence M A Le Cornec8, Mario Prieto Godoy3, Núria Queralt-Rosinach1, Leo J Schultze Kool5, Morris A Swertz9, Philip van Damme4, K Joeri van der Velde9, Nawel Lalout6,10, Shuxin Zhang4, Marco Roos1.
Abstract
BACKGROUND: The European Platform on Rare Disease Registration (EU RD Platform) aims to address the fragmentation of European rare disease (RD) patient data, scattered among hundreds of independent and non-coordinating registries, by establishing standards for integration and interoperability. The first practical output of this effort was a set of 16 Common Data Elements (CDEs) that should be implemented by all RD registries. Interoperability, however, requires decisions beyond data elements - including data models, formats, and semantics. Within the European Joint Programme on Rare Diseases (EJP RD), we aim to further the goals of the EU RD Platform by generating reusable RD semantic model templates that follow the FAIR Data Principles.Entities:
Keywords: Common data elements; Data transformation; Disease registries; FAIR data; Interoperability; Linked data; Ontologies; Rare disease; Semantic web
Mesh:
Year: 2022 PMID: 35292119 PMCID: PMC8922780 DOI: 10.1186/s13326-022-00264-6
Source DB: PubMed Journal: J Biomed Semantics
The European Platform for Rare Disease Registration set of Common Data Elements that should be made available by all rare disease registries
| Element ID | Name | Values |
|---|---|---|
| 1.1 | Pseudonym | String |
| 2.1 | Date of birth | dd/mm/yyyy |
| 2.2 | Sex | Female, Male, Undetermined, Foetus (Unknown) |
| 3.1 | Patient Status | Alive, Dead, Lost in Follow-up, Opted-out |
| 3.2 | Date of Death | dd/mm/yyyy |
| 4.1 | First contact with specialized centre | dd/mm/yyyy |
| 5.1 | Age at onset | Antenatal, At Birth, Date (dd/mm/yyyy), Undetermined. |
| 5.2 | Age at diagnosis | Antenatal, At Birth, Date, Undetermined |
| 6.1 | Diagnosis of the rare disease | ORPHA Code, Alpha Code, ICD9/10 Code, ICD9-CM Code |
| 6.2 | Genetic Diagnosis | Human Genome Variant Sequence (HGVS), HUGO Gene Nomenclature Committee (HGNC), Online Medelian Inheritance in Man (OMIIM) Codes |
| 6.3 | Undiagnosed case | Human Phenotype Ontology code and/or HGVS Code related to the inability to diagnose. |
| 7.1 | Agreement to be contacted for research purposes | Yes/No |
| 7.2 | Consent to reuse data | Yes/No |
| 7.3 | Biological Sample? | Yes/No |
| 7.4 | Biobank? | URL/No |
| 8.1 | Disability Classification via International Classification of Functioning and Disability (ICF) | Score |
Fig. 1Conceptual diagram of the overall SIO model to be applied to the CDEs. It is centred around five primary elements – identifiers, entities (physical and information-content), roles, processes, and attributes. In the diagram, we provide hypothetical examples of the specific ontological types that might be associated with each element
Models created to represent the CDEs. Models are created in YARRRML and made available on the CDE Project GitHub, accompanied by markdown documentation explaining the structure of an appropriate CSV file. Note that not all EU RD CDEs appear 1-to-1 with a CDE model. This is because, for example, the consent CDE can be reused for diverse types of consent (e.g., consent for contact, consent for data reuse), and the Pseudonym CDE is a part of every other model, and therefore has not been modelled as an independent element
| CDE Model Name | Purpose |
|---|---|
| Disease Progression [ | A “container” node to group together all other CDEs that refer to the same diagnosis. For example, the “age of diagnosis” CDE is related to a specific rare disease via traversal into the “disease progression” container, and then traversal into the “diagnosis” CDE that is also connected to “disease progression” |
| Care Pathway [ | Captures the date of first contact with the specialist healthcare system; is connected to “disease progression” |
| Diagnosis [ | Captures the final disease diagnosis using ORPHA codes; is connected to “disease progression” |
| Disease History [ | Captures age at first symptoms and age at diagnosis; is connected to “disease progression” |
| Genetic Diagnosis [ | Captures the sequence variant(s) found in this patient, using a variety of different coding systems; is connected to “disease progression” |
| Patient Consent [ | Captures the consent of the patient over several axes (e.g., consent for contact, consent for data reuse). Provides a reference to the signed consent form, as well as an input reference to the (blank) consent template. |
| Patient Status [ | Captures the current status of the patient, and their date of death if the patient is deceased |
| Personal Information [ | Captures (superficial) personal information such as birth date and sex (there are ongoing debates in the EJP modelling group as to whether this should be converted to an age, or an age-range, for improved privacy) |
| Phenotyping [ | Captures the phenotypes of the patient, using Human Phenotype Ontology terms |
| Disability [ | Captures the score for a disability test. The specific test administered is indicated as one of the child nodes of obo: NCIT_C20993 (Clinical or Research Assessment Tool), and thus this CDE model is broadly useful for many disorders. |
| Undiagnosed [ | Captures the case where a patient has phenotypic anomalies, and an identified sequence variant, but for some reason has not been definitively diagnosed. |
Fig. 2The Markdown documentation explaining how to prepare a CSV file for the “Patient Status” CDE. Documentation includes, where appropriate, the restrictions on the possible values in a given column, such as ‘status uri’ in this example
Fig. 3Visualization of an exemplar RDF instance for the “Patient Status” CDE (CDE 3.1 & 3.2)
Fig. 4Visualization of the ShEx validation shape for the Patient Status CDE data
Fig. 5The components of the workflow annotated with the responsibilities of the parties. The left side of the diagram, outlined in green, are the responsibilities of the data custodian in collaboration with the Data Steward. This includes export of their registry data into CSV format, and possibly some additional modification of that exported data to conform to the template. On the right is the fully automated CDE-in-a-Box, which is constructed by the FAIR Expert team and provided as a docker-compose installation. The arrow labelled “trigger” is the Web page call that the data custodian makes when they are ready to execute their transformation
Fig. 6The model for Laboratory Measurements. Of note are the three new connections on the “Quantitation” (Process) node – one representing the input (blood), one representing the target molecule (haemoglobin), and the third representing the link to the protocol. The remainder of the model is (structurally) identical to the core model shown in Fig. 1