Randi E Foraker1,2, Sean C Yu2, Aditi Gupta2, Andrew P Michelson3, Jose A Pineda Soto4, Ryan Colvin2,4, Francis Loh5, Marin H Kollef3, Thomas Maddox6, Bradley Evanoff1, Hovav Dror7, Noa Zamstein7, Albert M Lai1,2, Philip R O Payne1,2. 1. Division of General Medical Sciences, Department of Medicine, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA. 2. Department of Medicine, Institute for Informatics, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA. 3. Division of Pulmonary and Critical Care Medicine, Department of Medicine, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA. 4. Division of Critical Care Medicine, Department of Anesthesiology and Critical Care Medicine, Children's Hospital of Los Angeles, Los Angeles, California, USA. 5. School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA. 6. Healthcare Innovation Lab, BJC Healthcare, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA. 7. MDClone Ltd, Beer Sheva, Israel.
Abstract
BACKGROUND: Synthetic data may provide a solution to researchers who wish to generate and share data in support of precision healthcare. Recent advances in data synthesis enable the creation and analysis of synthetic derivatives as if they were the original data; this process has significant advantages over data deidentification. OBJECTIVES: To assess a big-data platform with data-synthesizing capabilities (MDClone Ltd., Beer Sheva, Israel) for its ability to produce data that can be used for research purposes while obviating privacy and confidentiality concerns. METHODS: We explored three use cases and tested the robustness of synthetic data by comparing the results of analyses using synthetic derivatives to analyses using the original data using traditional statistics, machine learning approaches, and spatial representations of the data. We designed these use cases with the purpose of conducting analyses at the observation level (Use Case 1), patient cohorts (Use Case 2), and population-level data (Use Case 3). RESULTS: For each use case, the results of the analyses were sufficiently statistically similar (P > 0.05) between the synthetic derivative and the real data to draw the same conclusions. DISCUSSION AND CONCLUSION: This article presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in clinical research for faster insights and improved data sharing in support of precision healthcare.
BACKGROUND: Synthetic data may provide a solution to researchers who wish to generate and share data in support of precision healthcare. Recent advances in data synthesis enable the creation and analysis of synthetic derivatives as if they were the original data; this process has significant advantages over data deidentification. OBJECTIVES: To assess a big-data platform with data-synthesizing capabilities (MDClone Ltd., Beer Sheva, Israel) for its ability to produce data that can be used for research purposes while obviating privacy and confidentiality concerns. METHODS: We explored three use cases and tested the robustness of synthetic data by comparing the results of analyses using synthetic derivatives to analyses using the original data using traditional statistics, machine learning approaches, and spatial representations of the data. We designed these use cases with the purpose of conducting analyses at the observation level (Use Case 1), patient cohorts (Use Case 2), and population-level data (Use Case 3). RESULTS: For each use case, the results of the analyses were sufficiently statistically similar (P > 0.05) between the synthetic derivative and the real data to draw the same conclusions. DISCUSSION AND CONCLUSION: This article presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in clinical research for faster insights and improved data sharing in support of precision healthcare.
Authors: Vincent Liu; Gabriel J Escobar; John D Greene; Jay Soule; Alan Whippy; Derek C Angus; Theodore J Iwashyna Journal: JAMA Date: 2014-07-02 Impact factor: 56.272
Authors: Mitchell M Levy; Mitchell P Fink; John C Marshall; Edward Abraham; Derek Angus; Deborah Cook; Jonathan Cohen; Steven M Opal; Jean-Louis Vincent; Graham Ramsay Journal: Intensive Care Med Date: 2003-03-28 Impact factor: 17.440
Authors: Chanu Rhee; Raymund Dantes; Lauren Epstein; David J Murphy; Christopher W Seymour; Theodore J Iwashyna; Sameer S Kadri; Derek C Angus; Robert L Danner; Anthony E Fiore; John A Jernigan; Greg S Martin; Edward Septimus; David K Warren; Anita Karcz; Christina Chan; John T Menchaca; Rui Wang; Susan Gruber; Michael Klompas Journal: JAMA Date: 2017-10-03 Impact factor: 56.272
Authors: Jason Walonoski; Mark Kramer; Joseph Nichols; Andre Quina; Chris Moesel; Dylan Hall; Carlton Duffett; Kudakwashe Dube; Thomas Gallagher; Scott McLachlan Journal: J Am Med Inform Assoc Date: 2018-03-01 Impact factor: 4.497
Authors: Ahmed Benzakour; Pavlos Altsitzioglou; Jean Michel Lemée; Alaaeldin Ahmad; Andreas F Mavrogenis; Thami Benzakour Journal: Int Orthop Date: 2022-07-29 Impact factor: 3.479
Authors: Jason A Thomas; Randi E Foraker; Noa Zamstein; Jon D Morrow; Philip R O Payne; Adam B Wilcox Journal: J Am Med Inform Assoc Date: 2022-07-12 Impact factor: 7.942
Authors: Randi Foraker; Aixia Guo; Jason Thomas; Noa Zamstein; Philip Ro Payne; Adam Wilcox Journal: J Med Internet Res Date: 2021-10-04 Impact factor: 5.428