Dukyong Yoon1,2, Martijn J Schuemie2,3, Ju Han Kim4, Dong Ki Kim5, Man Young Park6, Eun Kyoung Ahn1, Eun-Young Jung7, Dong Kyun Park7, Soo Yeon Cho1, Dahye Shin1, Yeonsoo Hwang8, Rae Woong Park1,2. 1. Department of Biomedical Informatics, Ajou University School of Medicine, Ajou University, Suwon, Korea. 2. Observational Health Data Sciences and Informatics, New York, NY, USA. 3. Janssen Research and Development LLC, Titusville, FL, USA. 4. Seoul National University Biomedical Informatics, Seoul National University College of Medicine, Seoul, Korea. 5. Department of Internal Medicine, Seoul National University College of Medicine, Seoul, Korea. 6. Mibyeong Research Center, Korea Institute of Oriental Medicine, Daejeon, Korea. 7. Centre for u-Healthcare, Gachon University Gil Hospital, Korea. 8. Center for Medical Informatics, Seoul National University Bundang Hospital, Seongnam, Korea.
Abstract
PURPOSE: Distributed research networks (DRNs) afford statistical power by integrating observational data from multiple partners for retrospective studies. However, laboratory test results across care sites are derived using different assays from varying patient populations, making it difficult to simply combine data for analysis. Additionally, existing normalization methods are not suitable for retrospective studies. We normalized laboratory results from different data sources by adjusting for heterogeneous clinico-epidemiologic characteristics of the data and called this the subgroup-adjusted normalization (SAN) method. METHODS: Subgroup-adjusted normalization renders the means and standard deviations of distributions identical under population structure-adjusted conditions. To evaluate its performance, we compared SAN with existing methods for simulated and real datasets consisting of blood urea nitrogen, serum creatinine, hematocrit, hemoglobin, serum potassium, and total bilirubin. Various clinico-epidemiologic characteristics can be applied together in SAN. For simplicity of comparison, age and gender were used to adjust population heterogeneity in this study. RESULTS: In simulations, SAN had the lowest standardized difference in means (SDM) and Kolmogorov-Smirnov values for all tests (p < 0.05). In a real dataset, SAN had the lowest SDM and Kolmogorov-Smirnov values for blood urea nitrogen, hematocrit, hemoglobin, and serum potassium, and the lowest SDM for serum creatinine (p < 0.05). CONCLUSION: Subgroup-adjusted normalization performed better than normalization using other methods. The SAN method is applicable in a DRN environment and should facilitate analysis of data integrated across DRN partners for retrospective observational studies.
PURPOSE: Distributed research networks (DRNs) afford statistical power by integrating observational data from multiple partners for retrospective studies. However, laboratory test results across care sites are derived using different assays from varying patient populations, making it difficult to simply combine data for analysis. Additionally, existing normalization methods are not suitable for retrospective studies. We normalized laboratory results from different data sources by adjusting for heterogeneous clinico-epidemiologic characteristics of the data and called this the subgroup-adjusted normalization (SAN) method. METHODS: Subgroup-adjusted normalization renders the means and standard deviations of distributions identical under population structure-adjusted conditions. To evaluate its performance, we compared SAN with existing methods for simulated and real datasets consisting of blood ureanitrogen, serum creatinine, hematocrit, hemoglobin, serum potassium, and total bilirubin. Various clinico-epidemiologic characteristics can be applied together in SAN. For simplicity of comparison, age and gender were used to adjust population heterogeneity in this study. RESULTS: In simulations, SAN had the lowest standardized difference in means (SDM) and Kolmogorov-Smirnov values for all tests (p < 0.05). In a real dataset, SAN had the lowest SDM and Kolmogorov-Smirnov values for blood ureanitrogen, hematocrit, hemoglobin, and serum potassium, and the lowest SDM for serum creatinine (p < 0.05). CONCLUSION: Subgroup-adjusted normalization performed better than normalization using other methods. The SAN method is applicable in a DRN environment and should facilitate analysis of data integrated across DRN partners for retrospective observational studies.