Chen Lin1, Elizabeth W Karlson2, Dmitriy Dligach3, Monica P Ramirez4, Timothy A Miller5, Huan Mo6, Natalie S Braggs7, Andrew Cagan8, Vivian Gainer8, Joshua C Denny9, Guergana K Savova5. 1. Boston Children's Hospital, Informatics Program, Boston, Massachusetts, USA *CL, EWK and DD are co-first authors. 2. Division of Rheumatology, Immunology and Allergy, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA Harvard Medical School, Boston, Massachusetts, USA *CL, EWK and DD are co-first authors. 3. Boston Children's Hospital, Informatics Program, Boston, Massachusetts, USA Harvard Medical School, Boston, Massachusetts, USA *CL, EWK and DD are co-first authors. 4. Division of Rheumatology, Immunology and Allergy, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA. 5. Boston Children's Hospital, Informatics Program, Boston, Massachusetts, USA Harvard Medical School, Boston, Massachusetts, USA. 6. Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA. 7. Department of Medicine, Vanderbilt University, Nashville, Tennessee, USA. 8. Research Computing, Partners HealthCare, Boston, Massachusetts, USA. 9. Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA Department of Medicine, Vanderbilt University, Nashville, Tennessee, USA.
Abstract
OBJECTIVES: To improve the accuracy of mining structured and unstructured components of the electronic medical record (EMR) by adding temporal features to automatically identify patients with rheumatoid arthritis (RA) with methotrexate-induced liver transaminase abnormalities. MATERIALS AND METHODS: Codified information and a string-matching algorithm were applied to a RA cohort of 5903 patients from Partners HealthCare to select 1130 patients with potential liver toxicity. Supervised machine learning was applied as our key method. For features, Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) was used to extract standard vocabulary from relevant sections of the unstructured clinical narrative. Temporal features were further extracted to assess the temporal relevance of event mentions with regard to the date of transaminase abnormality. All features were encapsulated in a 3-month-long episode for classification. Results were summarized at patient level in a training set (N=480 patients) and evaluated against a test set (N=120 patients). RESULTS: The system achieved positive predictive value (PPV) 0.756, sensitivity 0.919, F1 score 0.829 on the test set, which was significantly better than the best baseline system (PPV 0.590, sensitivity 0.703, F1 score 0.642). Our innovations, which included framing the phenotype problem as an episode-level classification task, and adding temporal information, all proved highly effective. CONCLUSIONS: Automated methotrexate-induced liver toxicity phenotype discovery for patients with RA based on structured and unstructured information in the EMR shows accurate results. Our work demonstrates that adding temporal features significantly improved classification results.
OBJECTIVES: To improve the accuracy of mining structured and unstructured components of the electronic medical record (EMR) by adding temporal features to automatically identify patients with rheumatoid arthritis (RA) with methotrexate-induced liver transaminase abnormalities. MATERIALS AND METHODS: Codified information and a string-matching algorithm were applied to a RA cohort of 5903 patients from Partners HealthCare to select 1130 patients with potential liver toxicity. Supervised machine learning was applied as our key method. For features, Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) was used to extract standard vocabulary from relevant sections of the unstructured clinical narrative. Temporal features were further extracted to assess the temporal relevance of event mentions with regard to the date of transaminase abnormality. All features were encapsulated in a 3-month-long episode for classification. Results were summarized at patient level in a training set (N=480 patients) and evaluated against a test set (N=120 patients). RESULTS: The system achieved positive predictive value (PPV) 0.756, sensitivity 0.919, F1 score 0.829 on the test set, which was significantly better than the best baseline system (PPV 0.590, sensitivity 0.703, F1 score 0.642). Our innovations, which included framing the phenotype problem as an episode-level classification task, and adding temporal information, all proved highly effective. CONCLUSIONS: Automated methotrexate-induced liver toxicity phenotype discovery for patients with RA based on structured and unstructured information in the EMR shows accurate results. Our work demonstrates that adding temporal features significantly improved classification results.
Authors: Jennifer A Pacheco; Pedro C Avila; Jason A Thompson; May Law; Jihan A Quraishi; Alyssa K Greiman; Eric M Just; Abel Kho Journal: AMIA Annu Symp Proc Date: 2009-11-14
Authors: Ashwin N Ananthakrishnan; Tianxi Cai; Guergana Savova; Su-Chun Cheng; Pei Chen; Raul Guzman Perez; Vivian S Gainer; Shawn N Murphy; Peter Szolovits; Zongqi Xia; Stanley Shaw; Susanne Churchill; Elizabeth W Karlson; Isaac Kohane; Robert M Plenge; Katherine P Liao Journal: Inflamm Bowel Dis Date: 2013-06 Impact factor: 5.325
Authors: J R Curtis; T Beukelman; A Onofrei; S Cassell; J D Greenberg; A Kavanaugh; G Reed; V Strand; J M Kremer Journal: Ann Rheum Dis Date: 2010-01 Impact factor: 19.103
Authors: Jyotishman Pathak; Kent R Bailey; Calvin E Beebe; Steven Bethard; David C Carrell; Pei J Chen; Dmitriy Dligach; Cory M Endle; Lacey A Hart; Peter J Haug; Stanley M Huff; Vinod C Kaggal; Dingcheng Li; Hongfang Liu; Kyle Marchant; James Masanz; Timothy Miller; Thomas A Oniki; Martha Palmer; Kevin J Peterson; Susan Rea; Guergana K Savova; Craig R Stancl; Sunghwan Sohn; Harold R Solbrig; Dale B Suesse; Cui Tao; David P Taylor; Les Westberg; Stephen Wu; Ning Zhuo; Christopher G Chute Journal: J Am Med Inform Assoc Date: 2013-11-04 Impact factor: 4.497
Authors: A N Ananthakrishnan; V S Gainer; R G Perez; T Cai; S-C Cheng; G Savova; P Chen; P Szolovits; Z Xia; P L De Jager; S Y Shaw; S Churchill; E W Karlson; I Kohane; R H Perlis; R M Plenge; S N Murphy; K P Liao Journal: Aliment Pharmacol Ther Date: 2013-01-07 Impact factor: 8.171
Authors: Carol J Waudby; Richard L Berg; James G Linneman; Luke V Rasmussen; Peggy L Peissig; Lin Chen; Catherine A McCarty Journal: BMC Ophthalmol Date: 2011-11-11 Impact factor: 2.209
Authors: Chen Lin; Elizabeth W Karlson; Helena Canhao; Timothy A Miller; Dmitriy Dligach; Pei Jun Chen; Raul Natanael Guzman Perez; Yuanyan Shen; Michael E Weinblatt; Nancy A Shadick; Robert M Plenge; Guergana K Savova Journal: PLoS One Date: 2013-08-16 Impact factor: 3.240
Authors: Chen Lin; Dmitriy Dligach; Timothy A Miller; Steven Bethard; Guergana K Savova Journal: J Am Med Inform Assoc Date: 2015-10-31 Impact factor: 4.497
Authors: Fang Li; Jingcheng Du; Yongqun He; Hsing-Yi Song; Mohcine Madkour; Guozheng Rao; Yang Xiang; Yi Luo; Henry W Chen; Sijia Liu; Liwei Wang; Hongfang Liu; Hua Xu; Cui Tao Journal: J Am Med Inform Assoc Date: 2020-07-01 Impact factor: 4.497
Authors: Vibhu Agarwal; Tanya Podchiyska; Juan M Banda; Veena Goel; Tiffany I Leung; Evan P Minty; Timothy E Sweeney; Elsie Gyang; Nigam H Shah Journal: J Am Med Inform Assoc Date: 2016-05-12 Impact factor: 4.497
Authors: William F Styler; Steven Bethard; Sean Finan; Martha Palmer; Sameer Pradhan; Piet C de Groen; Brad Erickson; Timothy Miller; Chen Lin; Guergana Savova; James Pustejovsky Journal: Trans Assoc Comput Linguist Date: 2014-04