| Literature DB >> 28829408 |
Iain Horton1, Yaxiong Lin2, Gay Reed3, Mathieu Wiepert4, Steven Hart5.
Abstract
Individualized medicine enables better diagnoses and treatment decisions for patients and promotes research in understanding the molecular underpinnings of disease. Linking individual patient's genomic and molecular information with their clinical phenotypes is crucial to these efforts. To address this need, the Center for Individualized Medicine at Mayo Clinic has implemented a genomic data warehouse and a workflow management system to bring data from institutional electronic health records and genomic sequencing data from both clinical and research bioinformatics sources into the warehouse. The system is the foundation for Mayo Clinic to build a suite of tools and interfaces to support various clinical and research use cases. The genomic data warehouse is positioned to play a key role in enhancing the research capabilities and advancing individualized patient care at Mayo Clinic.Entities:
Keywords: NGS; gVCF; genomic variant call format; next-generation sequencing; personalized medicine; pharmacogenomics; precision medicine; translational research
Year: 2017 PMID: 28829408 PMCID: PMC5618153 DOI: 10.3390/jpm7030007
Source DB: PubMed Journal: J Pers Med ISSN: 2075-4426
Figure 1Mayo Clinic genomic data warehouse architecture. EHR: electronic health records VCF: Variant Call Format2.2. Project Timeline
Figure 2Mayo Clinic genomic data warehouse implementation timeline. TRC: Translational Research Center.
Mayo Oracle Translational Research Center (TRC) implementation resources.
| Area | Role | Number of Members |
|---|---|---|
| IT | Database Administrator | 2 |
| IT | Data Pipeline Architect | 2 |
| IT | Architect | 2 |
| IT | Programmer | 6 |
| IT | Support Analyst | 2 |
| Bioinformatics | Bioinformatician | 2 |
| Biostatistics | Data Scientist | 2 |
| Project Management | Project Manager | 2 |
| Executive | IT Executive | 2 |
| Executive | Clinician | 1 |
IT: Information Technology.
Mayo Oracle TRC production hardware.
| Component | Quantity | CPU | Memory | Disk Space | Manufacturer |
|---|---|---|---|---|---|
| Oracle Exadata Database | 2 | Intel Xeon X5675 24 Core | 192 GB | 19 TB | Oracle, Redwood City, CA, USA |
| Application Server | 2 | Intel Xeon X5687 16 Core | 24 GB | 500 GB | Hewlett-Packard, Palo Alto, CA, USA |
| Oracle ZFS Storage Appliance | 1 | N/A | N/A | 2.5 TB | Oracle, Redwood City, CA, USA |
Mayo Oracle TRC reference data.
| Data Set | Version |
|---|---|
| Cosmic | V75 |
| Ensembl | ENS.76.GRCH38, ENS.73.GRCH37 |
| HUGO Gene Nomenclature Committee (HGNC) | 2015_10 |
| Human Genome Mutation Database (HGMD) | 2015.3 |
| PathwayCommons | 2013_09 |
| PolyPhen | GRCH38, GRCH37.P12 |
| Sorting Intolerant From Tolerant SIFT | GRCH38, GRCH37.P12 |
| SwissProt | 2015_11, 2013_09 |
Mayo Oracle TRC post-implementation resources.
| Area | Role | Number of Members |
|---|---|---|
| IT | Database Administrator | 1 |
| IT | Architect | 1 |
| IT | Programmer | 2 |
| IT | Support Analyst | 2 |
| Bioinformatics | Bioinformatician | As-needed |
| Project Management | Project Manager | 1 |
Figure 3Automated result pipeline and process. Fully automated genomic data validation and load process is managed by Workflow Manager (WFM). Data files in Variant Calling Format (VCF) and genomic VCF (gVCF) are dropped off in a specific folder on the high-performance computing (HPC) cluster where WFM automatically picks up the files, processes and loads them into TRC.
Figure 4Mayo variant summary (MVS).
Figure 5VCF extractor.
Mayo Clinic genomic data warehouse data statistics.
| Data Type | Total |
|---|---|
| Samples with Genomic Results | 11,734 |
| Research Samples | 9712 |
| Clinical Samples | 2022 |
| Research Studies with Genomic Results | 71 |
| Total Variant Count | 8,612,759,579 |
| Total Omics Results (Rows) | 68,431,547,534 |
| Total Patient Count | 9,283,510 |
| Total Subject Count | 149,714 |