| Literature DB >> 28212287 |
Karen Y He1, Dongliang Ge2, Max M He3,4.
Abstract
Genomic medicine attempts to build individualized strategies for diagnostic or therapeutic decision-making by utilizing patients' genomic information. Big Data analytics uncovers hidden patterns, unknown correlations, and other insights through examining large-scale various data sets. While integration and manipulation of diverse genomic data and comprehensive electronic health records (EHRs) on a Big Data infrastructure exhibit challenges, they also provide a feasible opportunity to develop an efficient and effective approach to identify clinically actionable genetic variants for individualized diagnosis and therapy. In this paper, we review the challenges of manipulating large-scale next-generation sequencing (NGS) data and diverse clinical data derived from the EHRs for genomic medicine. We introduce possible solutions for different challenges in manipulating, managing, and analyzing genomic and clinical data to implement genomic medicine. Additionally, we also present a practical Big Data toolset for identifying clinically actionable genetic variants using high-throughput NGS data and EHRs.Entities:
Keywords: Big Data analytics; clinically actionable genetic variants; electronic health records; healthcare; next-generation sequencing
Mesh:
Year: 2017 PMID: 28212287 PMCID: PMC5343946 DOI: 10.3390/ijms18020412
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Studies and efforts of leveraging genomic data and EHRs for genomic research/medicine.
| Project | Start Year | Aims | Website | Country |
|---|---|---|---|---|
| deCODE genetics | 1996 | To utilize population-based genomic data and EHRs to investigate inherited causes of common diseases | USA | |
| PMRP | 2002 | To enroll >20,000 participants to form a resource enabling researchers to study which genes cause diseases, which genes predict reactions to drugs, and how environment and genes work together to cause diseases | USA | |
| I2B2 | 2004 | To enable clinical researchers to use existing clinical data and genomic data for discovery research; to facilitate the design of targeted therapies for individual patients with diseases having genetic origins | USA | |
| CKB | 2004 | To identify the complex interplay between genes and environmental factors on the risks of common chronic diseases | China | |
| eMERGE | 2007 | To develop methods and best strategies for utilizing EHRs for genomic research in support of implementing genomic medicine | USA | |
| UK Biobank | 2007 | To improve the prevention, diagnosis, and treatment of a wide range of serious and life-threatening illnesses through a collection of 500,000 volunteers' biosamples and medical records | UK | |
| GANI_MED | 2009 | To develop targeted strategies for the prevention, diagnosis, and therapy of diseases, tailored to the specific characteristics of an individual patient or a well-defined patient group. Specifically, these strategies should improve prediction models for health and disease outcomes and also avoid inefficient therapy strategies and adverse side effects | Germany | |
| KP RPGEH | 2009 | To explore the genetic and environmental factors that influence common disease | USA | |
| SCAN-B Initiative | 2010 | To improve survival and quality of life for breast cancer patients through the introduction of gene expression and genomic tumor profiling into the clinical routine for breast cancer | Sweden | |
| PGPop | 2010 | To understand how a person’s genetic make-up affects his or her response to medications | USA | |
| MVP | 2011 | To enroll one million volunteers and use their clinical and genetic data to improve health care for veterans | USA | |
| Cancer 2015 Study | 2015 | To classify cancers molecularly using MPS to promote more targeted treatment of cancer patients and improve patient survival and outcomes | [ | Australia |
| Precision Medicine Initiative | 2016 | To gain better insights into the biological, environmental, and behavioral influences for disease treatment and prevention that takes into account individual variability in genes, environment, and lifestyle by using the genomic and clinical data of a million Americans | USA |
Figure 1The SRA database growth in the past eight years.
Figure 2The approximately files sizes of different NGS data formats and running times of generating those different format files. BWA: Burrows-Wheeler aligner, GATAK: genome analysis toolkit, BAM: the binary version of sequence alignment/map, FASTQ: a text-based format for representing either nucleotide sequences or peptide sequences, VCF: variant call format.
Figure 3The basic framework of SeqHBase for identifying clinically actionable genetic variants.