Cosmin A Bejan1, Daniel J Lee2, Yaomin Xu3, Ryan S Hsi4. 1. Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN. 2. Division of Urology, University of Pennsylvania Health System, Philadelphia, PA; Leonard Davis Institute of Health Economics, Philadelphia, PA. 3. Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN; Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN. 4. Department of Urology, Vanderbilt University Medical Center, Nashville, TN. Electronic address: ryan.hsi@vanderbilt.edu.
Abstract
OBJECTIVES: To demonstrate the utility of a natural language processing (NLP) algorithm for mining kidney stone composition in a large-scale electronic health records (EHR) repository. METHODS: We developed StoneX, a pattern-matching method for extracting kidney stone composition information from clinical notes. We trained the extraction algorithm on manually annotated text mentions of calcium oxalate monohydrate, calcium oxalate dihydrate, hydroxyapatite, brushite, uric acid, and struvite stones. We employed StoneX to identify patients with kidney stone composition data and mine >125 million notes from our institutional EHR. Analyses performed on the extracted patients included stone type conversions over time, survival analysis from a second stone surgery, and disease associations by stone composition to validate the phenotyping method against known associations. RESULTS: The NLP algorithm identified 45,235 text mentions corresponding to 11,585 patients. Overall, the system achieved positive predictive value >90% for calcium oxalate monohydrate, calcium oxalate dihydrate, hydroxyapatite, brushite, and struvite; except for uric acid (positive predictive value = 87.5%). Survival analysis from a second stone surgery showed statistically significant differences among stone types (P = .03). Several phenotype associations were found: uric acid-type 2 diabetes (odds ratio, OR = 2.69, 95% confidence intervals, CI = 1.91-3.79), struvite-neurogenic bladder (OR = 12.27, 95% CI = 4.33-34.79), struvite-urinary tract infection (OR = 7.36, 95% CI = 3.01-17.99), hydroxyapatite-pulmonary collapse (OR = 3.67, 95% CI = 2.10-6.42), hydroxyapatite-neurogenic bladder (OR = 5.23, 95% CI = 2.05-13.36), brushite-calcium metabolism disorder (OR = 4.59, 95% CI = 2.14-9.81), and brushite-hypercalcemia (OR = 4.09, 95% CI = 1.90-8.80). CONCLUSION: NLP extraction of kidney stone composition from large-scale EHRs is feasible with high precision, enabling high-throughput epidemiological studies of kidney stone disease. These tools will enable high fidelity kidney stone research from the EHR.
OBJECTIVES: To demonstrate the utility of a natural language processing (NLP) algorithm for mining kidney stone composition in a large-scale electronic health records (EHR) repository. METHODS: We developed StoneX, a pattern-matching method for extracting kidney stone composition information from clinical notes. We trained the extraction algorithm on manually annotated text mentions of calcium oxalate monohydrate, calcium oxalate dihydrate, hydroxyapatite, brushite, uric acid, and struvite stones. We employed StoneX to identify patients with kidney stone composition data and mine >125 million notes from our institutional EHR. Analyses performed on the extracted patients included stone type conversions over time, survival analysis from a second stone surgery, and disease associations by stone composition to validate the phenotyping method against known associations. RESULTS: The NLP algorithm identified 45,235 text mentions corresponding to 11,585 patients. Overall, the system achieved positive predictive value >90% for calcium oxalate monohydrate, calcium oxalate dihydrate, hydroxyapatite, brushite, and struvite; except for uric acid (positive predictive value = 87.5%). Survival analysis from a second stone surgery showed statistically significant differences among stone types (P = .03). Several phenotype associations were found: uric acid-type 2 diabetes (odds ratio, OR = 2.69, 95% confidence intervals, CI = 1.91-3.79), struvite-neurogenic bladder (OR = 12.27, 95% CI = 4.33-34.79), struvite-urinary tract infection (OR = 7.36, 95% CI = 3.01-17.99), hydroxyapatite-pulmonary collapse (OR = 3.67, 95% CI = 2.10-6.42), hydroxyapatite-neurogenic bladder (OR = 5.23, 95% CI = 2.05-13.36), brushite-calcium metabolism disorder (OR = 4.59, 95% CI = 2.14-9.81), and brushite-hypercalcemia (OR = 4.09, 95% CI = 1.90-8.80). CONCLUSION: NLP extraction of kidney stone composition from large-scale EHRs is feasible with high precision, enabling high-throughput epidemiological studies of kidney stone disease. These tools will enable high fidelity kidney stone research from the EHR.
Authors: Margaret S Pearle; David S Goldfarb; Dean G Assimos; Gary Curhan; Cynthia J Denu-Ciocca; Brian R Matlaga; Manoj Monga; Kristina L Penniston; Glenn M Preminger; Thomas M T Turk; James R White Journal: J Urol Date: 2014-05-20 Impact factor: 7.450
Authors: Anil A Thomas; Chengyi Zheng; Howard Jung; Allen Chang; Brian Kim; Joy Gelfond; Jeff Slezak; Kim Porter; Steven J Jacobsen; Gary W Chien Journal: World J Urol Date: 2013-02-17 Impact factor: 4.226
Authors: Lael Reinstatler; Karen Stern; Hunt Batter; Kymora B Scotland; Gholamreza Safaee Ardekani; Marcelino Rivera; Ben H Chew; Brian Eisner; Amy E Krambeck; Manoj Monga; Vernon M Pais Journal: J Urol Date: 2018-07-27 Impact factor: 7.450
Authors: Brian J Kim; Madhur Merchant; Chengyi Zheng; Anil A Thomas; Richard Contreras; Steven J Jacobsen; Gary W Chien Journal: J Endourol Date: 2014-12 Impact factor: 2.942
Authors: Ioana Danciu; James D Cowan; Melissa Basford; Xiaoming Wang; Alexander Saip; Susan Osgood; Jana Shirey-Rice; Jacqueline Kirby; Paul A Harris Journal: J Biomed Inform Date: 2014-02-14 Impact factor: 6.317
Authors: Cosmin A Bejan; John Angiolillo; Douglas Conway; Robertson Nash; Jana K Shirey-Rice; Loren Lipworth; Robert M Cronin; Jill Pulley; Sunil Kripalani; Shari Barkin; Kevin B Johnson; Joshua C Denny Journal: J Am Med Inform Assoc Date: 2018-01-01 Impact factor: 4.497