Literature DB >> 27107452

An informatics research agenda to support precision medicine: seven key areas.

Jessica D Tenenbaum¹, Paul Avillach², Marge Benham-Hutchins³, Matthew K Breitenstein⁴, Erin L Crowgey⁵, Mark A Hoffman⁶, Xia Jiang⁷, Subha Madhavan⁸, John E Mattison⁹, Radhakrishnan Nagarajan¹⁰, Bisakha Ray¹¹, Dmitriy Shin¹², Shyam Visweswaran¹³, Zhongming Zhao¹⁴, Robert R Freimuth⁴.

Abstract

The recent announcement of the Precision Medicine Initiative by President Obama has brought precision medicine (PM) to the forefront for healthcare providers, researchers, regulators, innovators, and funders alike. As technologies continue to evolve and datasets grow in magnitude, a strong computational infrastructure will be essential to realize PM's vision of improved healthcare derived from personal data. In addition, informatics research and innovation affords a tremendous opportunity to drive the science underlying PM. The informatics community must lead the development of technologies and methodologies that will increase the discovery and application of biomedical knowledge through close collaboration between researchers, clinicians, and patients. This perspective highlights seven key areas that are in need of further informatics research and innovation to support the realization of PM.

Entities: Chemical Disease Gene Species

Keywords: biomarkers; data sharing; informatics; precision medicine

Mesh：

Year: 2016 PMID： 27107452 PMCID： PMC4926738 DOI： 10.1093/jamia/ocv213

Source DB: PubMed Journal: J Am Med Inform Assoc ISSN： 1067-5027 Impact factor: 4.497

The recent announcement of the Precision Medicine (PM) Initiative by President Obama has brought PM to the forefront for healthcare providers, researchers, regulators, and funders alike. In order for PM to be fully realized, we must move toward a Learning Healthcare System model that extends evidence-based practice to practice-based evidence by using data generated through clinical care to inform research (Figure 1). The leadership and members of the American Medical Informatics Association Genomics and Translational Bioinformatics Working Group have identified seven key areas that informatics research should explore to enable PM’s vision.

Figure 1:

Informatics methodology enables precision medicine (PM) throughout the Learning Healthcare System cycle. Patients – past, present, and future – are at the beginning and end of the cycle. Both healthcare and research participation result in the generation of data. Informatics methods and tools help turn data into information, and information into knowledge. That knowledge, in turn, influences individuals’ behavior and informs patient care. Informatics plays a key role in enabling each stage and transition of this cycle.

Patients: Past, Present, and Future

Stakeholders in the biomedical enterprise include researchers, providers, payers, and patients. But nearly everyone has been or will be a patient at some point. Patients thus are, and must remain, at the heart of the biomedical enterprise.

Key Area One: Facilitate Electronic Consent and Specimen Tracking

In the era of PM, research studies produce more data than they can possibly use and, paradoxically, would benefit from more data than they can possibly generate. As genomic sequencing becomes increasingly available, using de-identified biospecimens for research becomes more nuanced. Research participants may be asked to give broad consent to the future use of their data and biospecimens, and to acknowledge the possible, though unlikely, prospect of sequence-based re-identification., To maximize data and biospecimen reuse while protecting study participants’ privacy and adhering to their wishes, it is essential to develop machine-readable consent forms that enable electronic queries. As large biorepositories linked to electronic health records (EHRs) become more common, informatics will enable researchers to identify cohorts – both intra- and interinstitutionally – that meet their study criteria and have given the requisite consent. Proper local management of specimens and derived samples enables accurate tracking of chain of custody, sample derivations, processing/handling, and quality control – all of which are key elements of rigorous and reproducible research. Structured and electronically available consent forms can empower study participants by allowing them to access, review, and modify their preferences. A number of large-scale initiatives, including Sage Bionetworks, the Genetic Alliance, and the Global Alliance for Genomic Health, are making progress in this area. Areas of informatics that can facilitate study participant consent and sample tracking include the development of structured consent forms and the adoption of relevant ontologies,, user interface design, and infrastructure to enable participant engagement after the point of enrollment. Developing an infrastructure to perform role-based distributed queries over cohorts and sample collections, such as those provided by OpenSpecimen, the Shared Health Research Information Network (SHRINE), and PopMedNet, will also be important.

Data to Knowledge

The promise of PM can only be realized by aggregating (virtually or otherwise) and analyzing data from multiple sources. A recent report by the National Academy of Sciences calls for the development of an information commons (IC) that amasses medical, molecular, social, environmental, and health outcomes data for large numbers of individual patients. The IC would be continuously updated, enable data analyses, and serve as the foundation for a knowledge base (KB) (see Key Area Five). Creating an IC would require informatics expertise to develop data standards, ensure data security, standardize processing pipelines, and establish data provenance.

Key Area Two: Develop, Deploy, and Adopt Data Standards to Ensure Data Privacy, Security, and Integrity, and to Facilitate Data Integration and Exchange

Transparency, reciprocity, respecting study participant preferences, data quality/integrity, and security are key to obtaining and maintaining the massive data stores needed for the advancement of PM. Data security does not mean data lock-down. Data-sharing can allow a study to proceed despite low numbers of eligible participants at any single institution, and can enable data reuse or meta-analyses. Data and metadata standards are required for data integration and exchange to be successful, but the lack of such standards or inconsistent use of existing standards are frequent barriers to this goal, especially in emergent “omics” disciplines. Data gaps are often discovered when existing standards are adopted for other purposes. Rather than creating yet another standard, those seeking to adopt an existing standard should work with its owners to help extend its scope. Conversely, funders and standards owners should place more emphasis on outreach and education/training for potential adopters of existing data standards. A number of initiatives are working to tackle different aspects of this challenge, including BioSharing, the Center for Expanded Data Annotation and Retrieval (CEDAR), the Biomedical and Healthcare Data Discovery Index Ecosystem (bioCADDIE), and Integrating Data for Analysis, Anonymization, and Sharing (iDASH). Although there have been significant efforts to share molecular datasets publicly, less progress has been made on sharing healthcare data. An emerging strategy is the development of clinical research networks in which EHR-derived data is stored locally, mapped to a common data model, and queried by proxy for members of a consortium or collaboration. Sharing queries rather than data resolves many of the issues that are involved in data standardization and harmonization, data governance, as well as the legal and privacy concerns surrounding other federated or aggregation models. This strategy has been adopted by initiatives such as MiniSentinel, Observational Health Data Sciences and Informatics (OHDSI), and the National Patient-Centered Clinical Research Network (PCORNet). Building on these networks to include genomic and other “omics” data, environmental data, and social data is one way forward in the development of ICs for PM. Work on data and metadata standards should be recognized and incentivized by the organizations that use and benefit from them, including academia, industry, government regulators, and funding agencies. New methods of encrypting and sharing genomic data in a way that enables collaborative research without compromising patient privacy are needed.

Key Area Three: Advance Methods for Biomarker Discovery and Translation

A primary goal of PM is to uncover subphenotypes defined by the distinct molecular mechanisms that underlie variations in disease manifestations and outcomes. One step toward defining subphenotypes is to establish agreed-upon phenotype definitions for existing disease classifications, a surprisingly complex task. A number of different initiatives (eg, the Electronic Medical Records and Genomics [eMERGE] Network and the National Institutes of Health [NIH] Collaboratory) are working to make phenotype definitions computationally tractable and reproducible between sites., Although some progress in sub-phenotyping has been made, new methods, including analyses of high-dimensional data, integration of different types of data (eg, “omics,” imaging, clinical, environmental),, and simulating disease behaviors across multiple biological scales in space and time, are needed to address a number of challenges. Although molecular biomarkers can help elucidate underlying physiological mechanisms of disease, only a minority of currently known biomarkers are clinically actionable. Moreover, critical disease subtype distinctions may be impacted by nonmolecular factors, such as socioeconomic status. Many questions must be answered before a potentially actionable biomarker can become part of a clinical guideline and translated into practice. Information that is necessary for bridging this gap might include the functional characterization of genes and pathways related to the biomarker, the level of evidence, and data about economic feasibility. Clinical decision making regarding actionable biomarkers would be facilitated by a framework for presenting different levels of evidence regarding whether and how a molecular abnormality, genomic or otherwise, might represent a therapeutically relevant biomarker., Variant annotations with actionable clinical information will enable decision support systems to provide interpretable and actionable patient-specific reports. Immediate areas for informatics research to focus on include computational phenotyping, biomarker discovery based on heterogeneous data sources, and frameworks for evaluating clinical actionability and utility.

Key Area Four: Implement and Enforce Protocols and Provenance

Scaling up PM requires complex processing and analytic steps applied to large, heterogeneous datasets. With so many “moving parts,” there are many opportunities for errors in the analysis, interpretation, or exchange of information. It is important that both final results and intermediate steps be well documented and fully reproducible. Protocols, and deviations from them, must also be documented. Software versions, analytical parameters, and reference database builds must all be captured as readily available metadata. Although spreadsheets and documents can be useful for informal data exploration, they do not constitute an adequate data management system. Large projects often share data between groups and may last several years, during which time key personnel may change institutions. All data processing and analysis for final results should be automated and documented so that another researcher can reproduce the work without making assumptions about what was done. There are various tools that enable this approach, including Taverna, preconfigured virtual machines, and Sage Bionetworks’s Synapse Platform. Though new challenges will always require novel and innovative solutions, the adoption of standard operating procedures when appropriate will facilitate consistency and improve interoperability. In addition, policies must be enacted and enforced to ensure responsible, reproducible, and reusable science. Processes and protocols for capturing and exchanging metadata and data provenance must be established, standardized, and widely adopted. Furthermore, this information must be considered to be as important as the primary data it describes, and funding agencies and publishers should insist that it be included with any dataset that is produced and released publicly.

Knowledge to Action

Clinical decision making requires the consolidation of PM knowledge and the development of clinical decision support tools (CDS), which, together with individual patient data, will provide actionable information at the point of care.

Key Area Five: Build a Precision Medicine Knowledge Base

A comprehensive KB will contain information about disease subtypes, disease risk, diagnosis, therapy, and prognosis that emerges from the ongoing analysis of data in an IC. Such a KB must be flexible, scalable, and extensible. Current KBs (eg, on genomic variants) are isolated from one another and do not support federated querying. Informatics solutions are needed for data-sharing and building a consensus on clinical interpretations of disparate, multiscale data. This KB must be machine-readable, as well as human-readable. Knowledge management technologies must enable effective ontological modeling, knowledge provenance, and new methodologies for updating and maintaining the integrated KB. Novel computational reasoning approaches must be utilized to allow efficient federated queries to be run across billions of knowledge units, enabling causal inference and decision support. New methods and processes must be developed to organize biomedical knowledge into integrated and interconnected KBs that will enable precision diagnostics and therapeutics based on the latest genomic discoveries and clinical evidence. Such KBs must provide federated queries and flexible computational analytics capabilities tailored for use by physicians and researchers.

Key Area Six: Enhance EHRs to Promote Precision Medicine

Commercial EHRs enable CDS for PM that is focused on information about a single gene variant. Informatics challenges for CDS include integrating next generation CDS with PM KBs to provide genome-based risk predictions, prognoses, and drug dosing at the point of care, as well as representing discrete genomic findings and interpretations in a machine-readable format (vs a free-text pathologist or geneticist report). Masys et al. proposed a framework for integrating genome-level data (stored external to the EHR) in which decision support systems are implemented through the EHR. EHRs will need to better aggregate and display patient information in order to allow users to view the heterogeneous data available for each patient, and EHRs will also need to structure and visually display the aggregated knowledge about each patient. Open interfaces that facilitate modular development of genomic CDSs outside of monolithic EHR vendor systems, enabling unencumbered parallel innovation/evolution of each element, should be provided. EHR systems must provide standards-based programming interfaces that enable the integration of external data and knowledge sources as well as the development of tools that support custom workflows, novel analytics, data visualization, and data aggregation. The informatics community must partner with EHR vendors to author use cases and develop interfaces, such that both parties benefit from the collaboration.

Key Area Seven: Facilitate Consumer Engagement

PM includes more than the medical care administered in a provider’s office. Most of the population spends far more time outside of the doctor’s office than in it. PM will require explicit acknowledgement of this fact as well as deeper consumer participation, which will involve making consumers aware of their own ongoing health status and engaging them in healthcare decision making. It will also involve collecting more information about a person’s environment and lifestyle choices between visits to the doctor – eg, activity level, nutrition information, exposure, and sleep patterns – and incorporating that information into targeted therapeutic and preventive treatments. Consumer access to genetic testing will increase as provider-ordered and direct-to-consumer genetic tests become more comprehensive and less expensive. Along with the recent announcement from 23andMe that the company will once again offer health-related information and Ancestry’s launch of AncestryHealth comes the increased importance of ensuring that consumers understand basic genetic principles and the implications of genetic testing, of trust in the accuracy of genetic tests, and of understanding of how these results, together with family history, will influence treatment decisions. User-friendly interfaces for the collection, visualization, and integration of consumer data with healthcare information will be key to realizing the potential value of nontraditional data sources. Standards for new consumer data types, as well as patient engagement around ethical, legal, and social issues, will also be important.

Conclusions

The emergence of PM as a priority in biomedical research and healthcare emphasizes the importance of informatics’ contributions to PM. This brief overview highlights essential research directions for both informatics researchers and funding organizations.

34 in total

1. A comparison of phenotype definitions for diabetes mellitus.

Authors: Rachel L Richesson; Shelley A Rusincovitch; Douglas Wixted; Bryan C Batch; Mark N Feinglos; Marie Lynn Miranda; W Ed Hammond; Robert M Califf; Susan E Spratt
Journal: J Am Med Inform Assoc Date: 2013-09-11 Impact factor: 4.497

2. Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network.

Authors: Katherine M Newton; Peggy L Peissig; Abel Ngo Kho; Suzette J Bielinski; Richard L Berg; Vidhu Choudhary; Melissa Basford; Christopher G Chute; Iftikhar J Kullo; Rongling Li; Jennifer A Pacheco; Luke V Rasmussen; Leslie Spangler; Joshua C Denny
Journal: J Am Med Inform Assoc Date: 2013-03-26 Impact factor: 4.497

3. In silico research in the era of cloud computing.

Authors: Joel T Dudley; Atul J Butte
Journal: Nat Biotechnol Date: 2010-11 Impact factor: 54.908

4. Technical desiderata for the integration of genomic data into Electronic Health Records.

Authors: Daniel R Masys; Gail P Jarvik; Neil F Abernethy; Nicholas R Anderson; George J Papanicolaou; Dina N Paltoo; Mark A Hoffman; Isaac S Kohane; Howard P Levy
Journal: J Biomed Inform Date: 2011-12-27 Impact factor: 6.317

Review 5. Multiscale cancer modeling.

Authors: Thomas S Deisboeck; Zhihui Wang; Paul Macklin; Vittorio Cristini
Journal: Annu Rev Biomed Eng Date: 2011-08-15 Impact factor: 9.590

6. The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud.

Authors: Katherine Wolstencroft; Robert Haines; Donal Fellows; Alan Williams; David Withers; Stuart Owen; Stian Soiland-Reyes; Ian Dunlop; Aleksandra Nenadic; Paul Fisher; Jiten Bhagat; Khalid Belhajjame; Finn Bacall; Alex Hardisty; Abraham Nieva de la Hidalga; Maria P Balcazar Vargas; Shoaib Sufi; Carole Goble
Journal: Nucleic Acids Res Date: 2013-05-02 Impact factor: 16.971

7. Meeting Report: BioSharing at ISMB 2010.

Authors: Dawn Field; Susanna Sansone; Edward F Delong; Peter Sterk; Iddo Friedberg; Pascale Gaudet; Susanna Lewis; Renzo Kottmann; Lynette Hirschman; George Garrity; Guy Cochrane; John Wooley; Folker Meyer; Sarah Hunter; Owen White; Brian Bramlett; Susan Gregurick; Hilmar Lapp; Sandra Orchard; Philippe Rocca-Serra; Alan Ruttenberg; Nigam Shah; Chris Taylor; Anne Thessen
Journal: Stand Genomic Sci Date: 2010-12-04

8. Enabling transparent and collaborative computational analysis of 12 tumor types within The Cancer Genome Atlas.

Authors: Larsson Omberg; Kyle Ellrott; Yuan Yuan; Cyriac Kandoth; Chris Wong; Michael R Kellen; Stephen H Friend; Josh Stuart; Han Liang; Adam A Margolin
Journal: Nat Genet Date: 2013-10 Impact factor: 38.330

9. Developing a data infrastructure for a learning health system: the PORTAL network.

Authors: Elizabeth A McGlynn; Tracy A Lieu; Mary L Durham; Alan Bauck; Reesa Laws; Alan S Go; Jersey Chen; Heather Spencer Feigelson; Douglas A Corley; Deborah Rohm Young; Andrew F Nelson; Arthur J Davidson; Leo S Morales; Michael G Kahn
Journal: J Am Med Inform Assoc Date: 2014-05-12 Impact factor: 4.497

10. PCORnet: turning a dream into reality.

Authors: Francis S Collins; Kathy L Hudson; Josephine P Briggs; Michael S Lauer
Journal: J Am Med Inform Assoc Date: 2014-05-12 Impact factor: 4.497

23 in total

1. Engaging hospitalized patients with personalized health information: a randomized trial of an inpatient portal.

Authors: Ruth M Masterson Creber; Lisa V Grossman; Beatriz Ryan; Min Qian; Fernanda C G Polubriaginof; Susan Restaino; Suzanne Bakken; George Hripcsak; David K Vawdrey
Journal: J Am Med Inform Assoc Date: 2019-02-01 Impact factor: 4.497

2. Meeting the challenge: Health information technology's essential role in achieving precision medicine.

Authors: Teresa Zayas-Cabán; Kevin J Chaney; Courtney C Rogers; Joshua C Denny; P Jon White
Journal: J Am Med Inform Assoc Date: 2021-06-12 Impact factor: 4.497

Review 3. Secondary Use and Analysis of Big Data Collected for Patient Care.

Authors: F J Martin-Sanchez; V Aguiar-Pulido; G H Lopez-Campos; N Peek; L Sacchi
Journal: Yearb Med Inform Date: 2017-09-11

4. The Diagnosis-Wide Landscape of Hospital-Acquired AKI.

Authors: Anne-Sophie Jannot; Anita Burgun; Eric Thervet; Nicolas Pallet
Journal: Clin J Am Soc Nephrol Date: 2017-05-11 Impact factor: 8.237

5. Machine learning approaches to personalize early prediction of asthma exacerbations.

Authors: Joseph Finkelstein; In Cheol Jeong
Journal: Ann N Y Acad Sci Date: 2016-09-14 Impact factor: 5.691

6. Clinical and Pharmacological Aspects of Hospital-Acquired Acute Kidney Injuries Outside the Intensive Care Unit: A Phenome-Wide Association Study.

Authors: Camille Nevoret; Anne-Sophie Jannot; Nicolas Pallet
Journal: Kidney Dis (Basel) Date: 2019-08-06

7. Precision medicine informatics.

Authors: Lewis J Frey; Elmer V Bernstam; Joshua C Denny
Journal: J Am Med Inform Assoc Date: 2016-06-06 Impact factor: 7.942

8. Integration of elicited expert information via a power prior in Bayesian variable selection: Application to colon cancer data.

Authors: Sandrine Boulet; Moreno Ursino; Peter Thall; Bruno Landi; Céline Lepère; Simon Pernot; Anita Burgun; Julien Taieb; Aziz Zaanan; Sarah Zohar; Anne-Sophie Jannot
Journal: Stat Methods Med Res Date: 2019-04-09 Impact factor: 2.494

Review 9. Omics-Based Strategies in Precision Medicine: Toward a Paradigm Shift in Inborn Errors of Metabolism Investigations.

Authors: Abdellah Tebani; Carlos Afonso; Stéphane Marret; Soumeya Bekri
Journal: Int J Mol Sci Date: 2016-09-14 Impact factor: 5.923

Review 10. Clinical Metabolomics: The New Metabolic Window for Inborn Errors of Metabolism Investigations in the Post-Genomic Era.

Authors: Abdellah Tebani; Lenaig Abily-Donval; Carlos Afonso; Stéphane Marret; Soumeya Bekri
Journal: Int J Mol Sci Date: 2016-07-20 Impact factor: 5.923