| Literature DB >> 32655885 |
Shira Rockowitz1,2,3, Nicholas LeCompte1,2,3, Mary Carmack1,2,3, Andrew Quitadamo1,2,3, Lily Wang1,2,3, Meredith Park4,5, Devon Knight4,5, Emma Sexton4,5, Lacey Smith4,5, Beth Sheidley4,5, Michael Field6, Ingrid A Holm2,3,7, Catherine A Brownstein2,3,7, Pankaj B Agrawal2,3,7,8, Susan Kornetsky9, Annapurna Poduri3,4,5, Scott B Snapper3,6, Alan H Beggs2,3,7, Timothy W Yu2,3,7, David A Williams3,10, Piotr Sliz1,2,3.
Abstract
While genomic data is frequently collected under distinct research protocols and disparate clinical and research regimes, there is a benefit in streamlining sequencing strategies to create harmonized databases, particularly in the area of pediatric rare disease. Research hospitals seeking to implement unified genomics workflows for research and clinical practice face numerous challenges, as they need to address the unique requirements and goals of the distinct environments and many stakeholders, including clinicians, researchers and sequencing providers. Here, we present outcomes of the first phase of the Children's Rare Disease Cohorts initiative (CRDC) that was completed at Boston Children's Hospital (BCH). We have developed a broadly sharable database of 2441 exomes from 15 pediatric rare disease cohorts, with major contributions from early onset epilepsy and early onset inflammatory bowel disease. All sequencing data is integrated and combined with phenotypic and research data in a genomics learning system (GLS). Phenotypes were both manually annotated and pulled automatically from patient medical records. Deployment of a genomically-ordered relational database allowed us to provide a modular and robust platform for centralized storage and analysis of research and clinical data, currently totaling 8516 exomes and 112 genomes. The GLS integrates analytical systems, including machine learning algorithms for automated variant classification and prioritization, as well as phenotype extraction via natural language processing (NLP) of clinical notes. This GLS is extensible to additional analytic systems and growing research and clinical collections of genomic and other types of data.Entities:
Keywords: Data processing; Genetic databases; Medical genomics; Paediatrics; Personalized medicine
Year: 2020 PMID: 32655885 PMCID: PMC7338382 DOI: 10.1038/s41525-020-0137-0
Source DB: PubMed Journal: NPJ Genom Med ISSN: 2056-7944 Impact factor: 8.617
Summary of protocol changes implemented by the CRDC to support patient rights and operations.
| Aspects | Consenting principles implemented under CRDC | Use in consent forms prior to CRDC on-boarding |
|---|---|---|
| Rights and interests of the patient | Identified variants are clinically confirmed | Sometimes |
| Rights and interests of the patient | Patient opts for return of primary results, primary and secondary/incidental, or neither | Very rarely |
| Rights and interests of the patient | Patient consents to re-contact to request additional data/samples and being offered enrollment in other research studies | Sometimes |
| Rights and interests of the patient | Patient data protected by NIH certificate of confidentiality | Rarely |
| Sample and data flow | Research consents contain language regarding the use of previously collected clinical data | Very rarely |
| Sample and data flow | Remote consenting and e-consenting are available | Very rarely |
| Sample and data flow | Supports Biobanking at the BCH Biobank | Rarely |
| Sample and data flow | Consent allows the identification of genetic factors | Sometimes |
| Sample and data flow | Consent enables identified CLIA sequencing upfront for streamlined confirmation | Never |
| BCH data use (secondary use, control sample across population) | Samples and data (genomic sequences, medical record information, and registry data) may be used for many types of non-restricted research, including biological and genetic research related and unrelated to the reason for participation in study | Rarely |
| BCH data use (secondary use, control sample across population) | Identified data can be shared with collaborators on IRB protocol and others at BCH | Rarely |
| Broad data use | Language of consent allows engagement with other academic networks and industry sponsors to accelerate discovery and therapeutics development | Sometimes |
Use of the consenting principles in Table 1 prior to the CRDC was evaluated on 26 research protocols, 10 of which have now incorporated the CRDC consenting principles, 8 of which are in the process of incorporating the principles, and 8 for which incorporation was not preferred or impossible. Very rarely incorporated principles were present in <20% of consent forms, rarely incorporated principles were present in <40% of consent forms, and sometimes incorporated principles were present in <60% of consent forms.
Fig. 1Sample collection.
Samples from patients enrolled in disease cohorts. The graphs contain weekly enrollment counts, normalized to average enrollment over the duration of their inclusion in the CRDC; the total number of pediatric patients that have been seen at BCH in the last year with the same ICD10 code; the number of individuals whose samples were submitted for sequencing with the CRDC at GeneDx; and the number of sequenced participants who were affected by the cohort disease.
Fig. 2Research to clinical workflow.
Patients with or without previous clinical testing were consented to harmonized research protocols. Patients were offered standardized sample collection mechanisms and most patients were dual consented to the Precision Link Biobank to support the collection of additional leftover clinical samples. Patient samples were CLIA sequenced by our sequencing provider (GeneDx) and data was returned to AWS where it was loaded into CRDC infrastructure for analysis. Once research teams identified a candidate variant, analysts worked with clinicians to order the clinical confirmation from the sequencing provider. Clinical confirmations were returned to BCH, added to the patient’s medical record, and communicated to the patient.
Fig. 3Data flow diagram of genomics learning system.
Raw data is processed by secondary pipelines into harmonized data which is ingested into GORdb by the data import API. Phenotypic data from the EDC and EHR are also incorporated. Built-in GORdb queries, as well as institutionally-developed queries operate on the merged data, and can be executed by calling GORdb API or through the WuXi NextCODE user interface. Raw and harmonized data are also made available to other analytic systems and BCH researchers. Information from these systems are fed back into GORdb. Aspects of the GLS are connected by a Python web server, which executes data transfer to/from the GLS components, sends automated alerts to researchers about new data availability and warnings to bioinformaticians about potential metadata errors (for instance, duplicate subject enrollment).
Fig. 4RC variant annotation workflow using Emedgene.
Flowchart workflow for evaluating variants prioritized by Emedgene using manually curated HPO terms, as well as CLiX Focus-derived HPO terms.
Identification of variants by different methods.
| Emedgene | WuXi NextCODE | |||
|---|---|---|---|---|
| Variants analyzed | Average days between upload and variant identification | Variants analyzed | Average days between upload and variant identification | |
| RC analysts | 239 | 26 | N/A | N/A |
| Research team analysts | 13 | 73 | 36 | 87 |