Literature DB >> 30456342

Are Synthetic Data Derivatives the Future of Translational Medicine?

Randi Foraker1, Douglas L Mann2, Philip R O Payne1.   

Abstract

Entities:  

Year:  2018        PMID: 30456342      PMCID: PMC6234614          DOI: 10.1016/j.jacbts.2018.08.007

Source DB:  PubMed          Journal:  JACC Basic Transl Sci        ISSN: 2452-302X


× No keyword cloud information.
As noted in this Editor’s Page previously, the rising cost of developing new cardiovascular therapies cannot be sustained in the long-term (1). Accordingly, there is a critical need for new methodologies that can improve the speed, efficiency, and success rate of efforts to develop new therapeutic strategies for cardiovascular disease (2). Although randomized clinical trials remain the gold standard to evaluate drug responsiveness, phase III clinical trials are costly due in part to the large numbers of patients that need to be enrolled and the long follow-up period needed to detect meaningful differences in survival or clinical outcomes (3). As an alternative to randomized clinical trials, clinical effectiveness studies can be conducted to evaluate drug responsiveness in diverse patient populations 4, 5. Such pragmatic approaches to evaluate drug responsiveness can be randomized or nonrandomized. If validly conducted, such studies can provide decision-makers with evidence from patients who are representative of those presenting to a clinic for a particular problem, thus accelerating translation into general clinical practice. Treatment decisions that must be made by clinicians include: Of existing treatments, which is best for an individual patient; what is the best treatment approach for patients with certain medical conditions; and how does one treatment compare with other existing alternatives? Ideally, clinicians would have the ability to query the electronic health record or another patient database for treatment efficacy from a population of similar patients in order to guide treatment decision-making for an individual patient (6). In the absence of these types of data or that of clinical trials, the quality of evidence available to answer these critical questions is frequently insufficient. Rarely are studies conducted to assess treatment effectiveness or patient outcomes in real-world practice settings, and often trials are not designed nor powered to evaluate the comparative effectiveness of treatments (7). To fill this gap, data are needed, not only to know how best to treat individual patients, but also to develop and refine evidence-based treatment guidelines. Decision-makers in need of this information include policymakers, payers, health care organizations, clinicians, and patients. To precisely estimate effect sizes, researchers must have access to sufficiently large and representative datasets. Although data sharing is an option to increase the sample size of an eligible study population, many institutions lack the infrastructure and support to do so (8). As a result, there are few networks of investigators who are willing and able to share data at the necessary scale in order to study drug responsiveness. This is a critical obstacle to progress, as data re-use and data sharing are essential for multisite, generalizable insights. Synthetic data derivatives offer one potential solution to the aforementioned problems (9). Synthetic datasets are generated from existing datasets and maintain the statistical properties of the original dataset. Importantly, rows of observations in synthetic datasets do not correspond to identifiable individuals (rows of data) from the original dataset. Thus, synthetic data derivatives are quantitatively identical to patient-derived datasets, yet cannot be linked to the individuals from whom the data were derived (9). Because synthetic data contain no protected health information, the datasets can be shared freely among investigators or those in industry, without raising patient privacy concerns. In addition, research conducted using synthetic derivatives does not require institutional review board approval. Notably, data synthesis differs from the anonymization or de-identification of protected health information through the removal of identifiable data elements or their obfuscation (10). Alternative approaches to synthetic derivatives include establishing a data enclave with restricted access and data-sharing requirements, or limiting access to only data that are relevant to a specific research question (11). Each of these alternatives does not ensure data privacy, because de-identified data can be re-identified with linkage to another data source, and security and confidentiality breaches can occur even with limited access to protected systems. Using a data synthesis platform allows for the linkage of multiple sources of data before producing a synthetic derivative, and reduces data ownership concerns when combining data across organizational boundaries. Having the capability to combine datasets before synthesis results in a data product that provides a more comprehensive view of the patient, and facilitates the evaluation of factors related to drug responsiveness including those of health care quality and patient safety. For researchers, the ability to produce and share synthetic datasets can shorten the idea-to-insight time from years (as with expensive, lengthy clinical trials) to hours, and lessens legal and ethical barriers to data sharing. Not only does access to synthetic data allow for efficiencies in research, but the potential of synthetic data is great for saving time and money in drug development and responsiveness as well. Can synthetic data be used to evaluate drug responsiveness? One of the major difficulties in developing new therapies relates to the inherent fragility of phase II trials. Because of cost constraints, the sample size of patients enrolled in early-phase trials is relatively small, and the number of drug doses that one can study feasibly is often limited. The size of phase II trials also restricts the range of endpoints that one can measure to gauge clinical effectiveness. Further, phase II trials are often performed in large academic medical centers that serve as tertiary and quaternary referral centers where the patient population may vary significantly from those studied in larger phase III trials. Although speculative, one immediate application of synthetic datasets in phase II studies could be to generate groups of control patients that faithfully mimic the patients who are receiving active therapy in early phase clinical trials. If properly designed, these studies could be performed in a randomized, double-blind manner. Bayesian statistical methods could then be used to compare the response of patients receiving active therapy to patients enrolled in a synthetic control group. This would allow investigators to prioritize their precious resources to enroll more patients in the active therapy arms, which would also mitigate some of the statistical problems that occur when using small control groups that do not complement the demographics of the disease being studied. Another way in which synthetic data could be used is in the context of largescale and pragmatic trials that evaluate novel targeted therapies that involve genomic targets, insofar as conventional randomized clinical trials are often impracticable because of the large sample sizes that are required to demonstrate clinical effectiveness in this setting 12, 13. Lastly, one can imagine using synthetic datasets to predict trends in rare diseases, which in turn could be used to design appropriately powered clinical trials that target clinically meaningful end points. What are some of the limitations of using synthetic data to evaluate drug responsiveness? One potentially important limitation is that whereas synthetic models derived from existing datasets may replicate certain general trends of the dataset, they may not necessarily be able to predict specific trends within a dataset (e.g., all-cause death vs. cardiovascular death). Although this limitation remains theoretical at present, it may be problematic with respect to using synthetic datasets to evaluate novel therapeutics. Whether creating a larger derivative dataset that contains an adequate number of outcomes of interest in order to estimate drug effects accurately will satisfactorily address this issue remains an important question that will require further study. Second, there is no consensus about how best to create synthetic datasets. Fully synthetic datasets do not contain any original data, whereas partially-synthetic datasets may only de-identify or anonymize sensitive values. There are theoretical advantages and disadvantages to both approaches; however, there is no information with respect to which approach is better for predicting drug responsiveness. Lastly, at the time of this writing, the Food and Drug Administration has not yet approved the use of synthetic datasets for registration studies: it is simply too soon. Reducing the cost of developing new cardiovascular therapies will require fundamental changes to the way in which we conduct preclinical and clinical trials in order to make them faster, cheaper, and more adaptable. Here, we suggest that the use of synthetic data derivatives may help with the development of new and novel cardiovascular drugs. As always, we welcome comments and suggestions from investigators in academia and industry, patients, societies, and all of the governmental regulatory agencies about your thoughts about the potential role of synthetic data in translational medicine, either through social media () or by e-mail (JACC@acc.org).
  12 in total

1.  Practical clinical trials: increasing the value of clinical research for decision making in clinical and health policy.

Authors:  Sean R Tunis; Daniel B Stryer; Carolyn M Clancy
Journal:  JAMA       Date:  2003-09-24       Impact factor: 56.272

2.  Open science and data sharing in clinical research: basing informed decisions on the totality of the evidence.

Authors:  Harlan M Krumholz
Journal:  Circ Cardiovasc Qual Outcomes       Date:  2012-03-01

Review 3.  The limitations of using randomised controlled trials as a basis for developing treatment guidelines.

Authors:  Roger Mulder; Ajeet B Singh; Amber Hamilton; Pritha Das; Tim Outhred; Grace Morris; Darryl Bassett; Bernhard T Baune; Michael Berk; Philip Boyce; Bill Lyndon; Gordon Parker; Gin S Malhi
Journal:  Evid Based Ment Health       Date:  2017-07-14

4.  Rationale and design for the Antihypertensive and Lipid Lowering Treatment to Prevent Heart Attack Trial (ALLHAT). ALLHAT Research Group.

Authors:  B R Davis; J A Cutler; D J Gordon; C D Furberg; J T Wright; W C Cushman; R H Grimm; J LaRosa; P K Whelton; H M Perry; M H Alderman; C E Ford; S Oparil; C Francis; M Proschan; S Pressel; H R Black; C M Hawkins
Journal:  Am J Hypertens       Date:  1996-04       Impact factor: 2.689

5.  An interview with Robert S. Lane, Ph.D. Interviewed by Vicki Glaser.

Authors:  Robert S Lane
Journal:  Vector Borne Zoonotic Dis       Date:  2010-03       Impact factor: 2.133

Review 6.  Biomedical informatics and outcomes research: enabling knowledge-driven health care.

Authors:  Peter J Embi; Stanley E Kaufman; Philip R O Payne
Journal:  Circulation       Date:  2009-12-08       Impact factor: 29.690

7.  Considerations in the evaluation of surrogate endpoints in clinical trials. summary of a National Institutes of Health workshop.

Authors:  V G De Gruttola; P Clax; D L DeMets; G J Downing; S S Ellenberg; L Friedman; M H Gail; R Prentice; J Wittes; S L Zeger
Journal:  Control Clin Trials       Date:  2001-10

8.  The n-of-1 clinical trial: the ultimate strategy for individualizing medicine?

Authors:  Elizabeth O Lillie; Bradley Patay; Joel Diamant; Brian Issell; Eric J Topol; Nicholas J Schork
Journal:  Per Med       Date:  2011-03       Impact factor: 2.512

Review 9.  A pragmatic view on pragmatic trials.

Authors:  Nikolaos A Patsopoulos
Journal:  Dialogues Clin Neurosci       Date:  2011       Impact factor: 5.986

10.  The Rising Cost of Developing Cardiovascular Therapies and Reproducibility in Translational Research: Do Not Blame It (All) on the Bench.

Authors:  Douglas L Mann
Journal:  JACC Basic Transl Sci       Date:  2017-10-30
View more
  9 in total

1.  Demonstrating an approach for evaluating synthetic geospatial and temporal epidemiologic data utility: results from analyzing >1.8 million SARS-CoV-2 tests in the United States National COVID Cohort Collaborative (N3C).

Authors:  Jason A Thomas; Randi E Foraker; Noa Zamstein; Jon D Morrow; Philip R O Payne; Adam B Wilcox
Journal:  J Am Med Inform Assoc       Date:  2022-07-12       Impact factor: 7.942

2.  Piloting a model-to-data approach to enable predictive analytics in health care through patient mortality prediction.

Authors:  Timothy Bergquist; Yao Yan; Thomas Schaffter; Thomas Yu; Vikas Pejaver; Noah Hammarlund; Justin Prosser; Justin Guinney; Sean Mooney
Journal:  J Am Med Inform Assoc       Date:  2020-07-01       Impact factor: 4.497

3.  The National COVID Cohort Collaborative: Analyses of Original and Computationally Derived Electronic Health Record Data.

Authors:  Randi Foraker; Aixia Guo; Jason Thomas; Noa Zamstein; Philip Ro Payne; Adam Wilcox
Journal:  J Med Internet Res       Date:  2021-10-04       Impact factor: 5.428

4.  Predicting mortality among patients with liver cirrhosis in electronic health records with machine learning.

Authors:  Aixia Guo; Nikhilesh R Mazumder; Daniela P Ladner; Randi E Foraker
Journal:  PLoS One       Date:  2021-08-31       Impact factor: 3.240

5.  Deep Convolutional Generative Adversarial Networks to Enhance Artificial Intelligence in Healthcare: A Skin Cancer Application.

Authors:  Marco La Salvia; Emanuele Torti; Raquel Leon; Himar Fabelo; Samuel Ortega; Beatriz Martinez-Vega; Gustavo M Callico; Francesco Leporati
Journal:  Sensors (Basel)       Date:  2022-08-17       Impact factor: 3.847

6.  Sharing ICU Patient Data Responsibly Under the Society of Critical Care Medicine/European Society of Intensive Care Medicine Joint Data Science Collaboration: The Amsterdam University Medical Centers Database (AmsterdamUMCdb) Example.

Authors:  Patrick J Thoral; Jan M Peppink; Ronald H Driessen; Eric J G Sijbrands; Erwin J O Kompanje; Lewis Kaplan; Heatherlee Bailey; Jozef Kesecioglu; Maurizio Cecconi; Matthew Churpek; Gilles Clermont; Mihaela van der Schaar; Ari Ercole; Armand R J Girbes; Paul W G Elbers
Journal:  Crit Care Med       Date:  2021-06-01       Impact factor: 9.296

7.  Demonstrating an approach for evaluating synthetic geospatial and temporal epidemiologic data utility: Results from analyzing >1.8 million SARS-CoV-2 tests in the United States National COVID Cohort Collaborative (N3C).

Authors:  Jason A Thomas; Randi E Foraker; Noa Zamstein; Philip R O Payne; Adam B Wilcox
Journal:  medRxiv       Date:  2021-07-08

8.  Spot the difference: comparing results of analyses from real patient data and synthetic derivatives.

Authors:  Randi E Foraker; Sean C Yu; Aditi Gupta; Andrew P Michelson; Jose A Pineda Soto; Ryan Colvin; Francis Loh; Marin H Kollef; Thomas Maddox; Bradley Evanoff; Hovav Dror; Noa Zamstein; Albert M Lai; Philip R O Payne
Journal:  JAMIA Open       Date:  2020-12-14

9.  The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment.

Authors:  Melissa A Haendel; Christopher G Chute; Tellen D Bennett; David A Eichmann; Justin Guinney; Warren A Kibbe; Philip R O Payne; Emily R Pfaff; Peter N Robinson; Joel H Saltz; Heidi Spratt; Christine Suver; John Wilbanks; Adam B Wilcox; Andrew E Williams; Chunlei Wu; Clair Blacketer; Robert L Bradford; James J Cimino; Marshall Clark; Evan W Colmenares; Patricia A Francis; Davera Gabriel; Alexis Graves; Raju Hemadri; Stephanie S Hong; George Hripscak; Dazhi Jiao; Jeffrey G Klann; Kristin Kostka; Adam M Lee; Harold P Lehmann; Lora Lingrey; Robert T Miller; Michele Morris; Shawn N Murphy; Karthik Natarajan; Matvey B Palchuk; Usman Sheikh; Harold Solbrig; Shyam Visweswaran; Anita Walden; Kellie M Walters; Griffin M Weber; Xiaohan Tanner Zhang; Richard L Zhu; Benjamin Amor; Andrew T Girvin; Amin Manna; Nabeel Qureshi; Michael G Kurilla; Sam G Michael; Lili M Portilla; Joni L Rutter; Christopher P Austin; Ken R Gersing
Journal:  J Am Med Inform Assoc       Date:  2021-03-01       Impact factor: 7.942

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.