Victor M Castro1, Jessica Minnier, Shawn N Murphy, Isaac Kohane, Susanne E Churchill, Vivian Gainer, Tianxi Cai, Alison G Hoffnagle, Yael Dai, Stefanie Block, Sydney R Weill, Mireya Nadal-Vicens, Alisha R Pollastri, J Niels Rosenquist, Sergey Goryachev, Dost Ongur, Pamela Sklar, Roy H Perlis, Jordan W Smoller. 1. From Research Information Systems and Computing, Partners HealthCare System, Boston; the Laboratory of Computer Science, the Department of Neurology, the Department of Psychiatry, the Center for Anxiety and Traumatic Stress Disorders, the Center for Experimental Drugs and Diagnostics, and the Psychiatric and Neurodevelopmental Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, Boston; the Center for Biomedical Informatics, Harvard Medical School, Boston; the Department of Biostatistics, Harvard School of Public Health, Boston; the Psychotic Disorders Division, McLean Hospital, Belmont, Mass.; the Department of Public Health and Preventive Medicine, Oregon Health and Science University, Portland; and the Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York.
Abstract
OBJECTIVE: The study was designed to validate use of electronic health records (EHRs) for diagnosing bipolar disorder and classifying control subjects. METHOD: EHR data were obtained from a health care system of more than 4.6 million patients spanning more than 20 years. Experienced clinicians reviewed charts to identify text features and coded data consistent or inconsistent with a diagnosis of bipolar disorder. Natural language processing was used to train a diagnostic algorithm with 95% specificity for classifying bipolar disorder. Filtered coded data were used to derive three additional classification rules for case subjects and one for control subjects. The positive predictive value (PPV) of EHR-based bipolar disorder and subphenotype diagnoses was calculated against diagnoses from direct semistructured interviews of 190 patients by trained clinicians blind to EHR diagnosis. RESULTS: The PPV of bipolar disorder defined by natural language processing was 0.85. Coded classification based on strict filtering achieved a value of 0.79, but classifications based on less stringent criteria performed less well. No EHR-classified control subject received a diagnosis of bipolar disorder on the basis of direct interview (PPV=1.0). For most subphenotypes, values exceeded 0.80. The EHR-based classifications were used to accrue 4,500 bipolar disorder cases and 5,000 controls for genetic analyses. CONCLUSIONS: Semiautomated mining of EHRs can be used to ascertain bipolar disorder patients and control subjects with high specificity and predictive value compared with diagnostic interviews. EHRs provide a powerful resource for high-throughput phenotyping for genetic and clinical research.
OBJECTIVE: The study was designed to validate use of electronic health records (EHRs) for diagnosing bipolar disorder and classifying control subjects. METHOD: EHR data were obtained from a health care system of more than 4.6 million patients spanning more than 20 years. Experienced clinicians reviewed charts to identify text features and coded data consistent or inconsistent with a diagnosis of bipolar disorder. Natural language processing was used to train a diagnostic algorithm with 95% specificity for classifying bipolar disorder. Filtered coded data were used to derive three additional classification rules for case subjects and one for control subjects. The positive predictive value (PPV) of EHR-based bipolar disorder and subphenotype diagnoses was calculated against diagnoses from direct semistructured interviews of 190 patients by trained clinicians blind to EHR diagnosis. RESULTS: The PPV of bipolar disorder defined by natural language processing was 0.85. Coded classification based on strict filtering achieved a value of 0.79, but classifications based on less stringent criteria performed less well. No EHR-classified control subject received a diagnosis of bipolar disorder on the basis of direct interview (PPV=1.0). For most subphenotypes, values exceeded 0.80. The EHR-based classifications were used to accrue 4,500 bipolar disorder cases and 5,000 controls for genetic analyses. CONCLUSIONS: Semiautomated mining of EHRs can be used to ascertain bipolar disorder patients and control subjects with high specificity and predictive value compared with diagnostic interviews. EHRs provide a powerful resource for high-throughput phenotyping for genetic and clinical research.
Authors: R H Perlis; D V Iosifescu; V M Castro; S N Murphy; V S Gainer; J Minnier; T Cai; S Goryachev; Q Zeng; P J Gallagher; M Fava; J B Weilburg; S E Churchill; I S Kohane; J W Smoller Journal: Psychol Med Date: 2011-06-20 Impact factor: 7.723
Authors: Hua Xu; Min Jiang; Matt Oetjens; Erica A Bowton; Andrea H Ramirez; Janina M Jeff; Melissa A Basford; Jill M Pulley; James D Cowan; Xiaoming Wang; Marylyn D Ritchie; Daniel R Masys; Dan M Roden; Dana C Crawford; Joshua C Denny Journal: J Am Med Inform Assoc Date: 2011 Jul-Aug Impact factor: 4.497
Authors: N P Tatonetti; J C Denny; S N Murphy; G H Fernald; G Krishnan; V Castro; P Yue; P S Tsao; P S Tsau; I Kohane; D M Roden; R B Altman Journal: Clin Pharmacol Ther Date: 2011-05-25 Impact factor: 6.875
Authors: Fina Kurreeman; Katherine Liao; Lori Chibnik; Brendan Hickey; Eli Stahl; Vivian Gainer; Gang Li; Lynn Bry; Scott Mahan; Kristin Ardlie; Brian Thomson; Peter Szolovits; Susanne Churchill; Shawn N Murphy; Tianxi Cai; Soumya Raychaudhuri; Isaac Kohane; Elizabeth Karlson; Robert M Plenge Journal: Am J Hum Genet Date: 2011-01-07 Impact factor: 11.025
Authors: Daniel R Masys; Gail P Jarvik; Neil F Abernethy; Nicholas R Anderson; George J Papanicolaou; Dina N Paltoo; Mark A Hoffman; Isaac S Kohane; Howard P Levy Journal: J Biomed Inform Date: 2011-12-27 Impact factor: 6.317
Authors: Darrel A Regier; William E Narrow; Diana E Clarke; Helena C Kraemer; S Janet Kuramoto; Emily A Kuhl; David J Kupfer Journal: Am J Psychiatry Date: 2013-01 Impact factor: 18.112
Authors: John S Brownstein; Shawn N Murphy; Allison B Goldfine; Richard W Grant; Margarita Sordo; Vivian Gainer; Judith A Colecchi; Anil Dubey; David M Nathan; John P Glaser; Isaac S Kohane Journal: Diabetes Care Date: 2009-12-15 Impact factor: 19.112
Authors: Stephan Ripke; Colm O'Dushlaine; Kimberly Chambert; Jennifer L Moran; Anna K Kähler; Susanne Akterin; Sarah E Bergen; Ann L Collins; James J Crowley; Menachem Fromer; Yunjung Kim; Sang Hong Lee; Patrik K E Magnusson; Nick Sanchez; Eli A Stahl; Stephanie Williams; Naomi R Wray; Kai Xia; Francesco Bettella; Anders D Borglum; Brendan K Bulik-Sullivan; Paul Cormican; Nick Craddock; Christiaan de Leeuw; Naser Durmishi; Michael Gill; Vera Golimbet; Marian L Hamshere; Peter Holmans; David M Hougaard; Kenneth S Kendler; Kuang Lin; Derek W Morris; Ole Mors; Preben B Mortensen; Benjamin M Neale; Francis A O'Neill; Michael J Owen; Milica Pejovic Milovancevic; Danielle Posthuma; John Powell; Alexander L Richards; Brien P Riley; Douglas Ruderfer; Dan Rujescu; Engilbert Sigurdsson; Teimuraz Silagadze; August B Smit; Hreinn Stefansson; Stacy Steinberg; Jaana Suvisaari; Sarah Tosato; Matthijs Verhage; James T Walters; Douglas F Levinson; Pablo V Gejman; Kenneth S Kendler; Claudine Laurent; Bryan J Mowry; Michael C O'Donovan; Michael J Owen; Ann E Pulver; Brien P Riley; Sibylle G Schwab; Dieter B Wildenauer; Frank Dudbridge; Peter Holmans; Jianxin Shi; Margot Albus; Madeline Alexander; Dominique Campion; David Cohen; Dimitris Dikeos; Jubao Duan; Peter Eichhammer; Stephanie Godard; Mark Hansen; F Bernard Lerer; Kung-Yee Liang; Wolfgang Maier; Jacques Mallet; Deborah A Nertney; Gerald Nestadt; Nadine Norton; Francis A O'Neill; George N Papadimitriou; Robert Ribble; Alan R Sanders; Jeremy M Silverman; Dermot Walsh; Nigel M Williams; Brandon Wormley; Maria J Arranz; Steven Bakker; Stephan Bender; Elvira Bramon; David Collier; Benedicto Crespo-Facorro; Jeremy Hall; Conrad Iyegbe; Assen Jablensky; Rene S Kahn; Luba Kalaydjieva; Stephen Lawrie; Cathryn M Lewis; Kuang Lin; Don H Linszen; Ignacio Mata; Andrew McIntosh; Robin M Murray; Roel A Ophoff; John Powell; Dan Rujescu; Jim Van Os; Muriel Walshe; Matthias Weisbrod; Durk Wiersma; Peter Donnelly; Ines Barroso; Jenefer M Blackwell; Elvira Bramon; Matthew A Brown; Juan P Casas; Aiden P Corvin; Panos Deloukas; Audrey Duncanson; Janusz Jankowski; Hugh S Markus; Christopher G Mathew; Colin N A Palmer; Robert Plomin; Anna Rautanen; Stephen J Sawcer; Richard C Trembath; Ananth C Viswanathan; Nicholas W Wood; Chris C A Spencer; Gavin Band; Céline Bellenguez; Colin Freeman; Garrett Hellenthal; Eleni Giannoulatou; Matti Pirinen; Richard D Pearson; Amy Strange; Zhan Su; Damjan Vukcevic; Peter Donnelly; Cordelia Langford; Sarah E Hunt; Sarah Edkins; Rhian Gwilliam; Hannah Blackburn; Suzannah J Bumpstead; Serge Dronov; Matthew Gillman; Emma Gray; Naomi Hammond; Alagurevathi Jayakumar; Owen T McCann; Jennifer Liddle; Simon C Potter; Radhi Ravindrarajah; Michelle Ricketts; Avazeh Tashakkori-Ghanbaria; Matthew J Waller; Paul Weston; Sara Widaa; Pamela Whittaker; Ines Barroso; Panos Deloukas; Christopher G Mathew; Jenefer M Blackwell; Matthew A Brown; Aiden P Corvin; Mark I McCarthy; Chris C A Spencer; Elvira Bramon; Aiden P Corvin; Michael C O'Donovan; Kari Stefansson; Edward Scolnick; Shaun Purcell; Steven A McCarroll; Pamela Sklar; Christina M Hultman; Patrick F Sullivan Journal: Nat Genet Date: 2013-08-25 Impact factor: 38.330
Authors: Katherine P Liao; Jiehuan Sun; Tianrun A Cai; Nicholas Link; Chuan Hong; Jie Huang; Jennifer E Huffman; Jessica Gronsbell; Yichi Zhang; Yuk-Lam Ho; Victor Castro; Vivian Gainer; Shawn N Murphy; Christopher J O'Donnell; J Michael Gaziano; Kelly Cho; Peter Szolovits; Isaac S Kohane; Sheng Yu; Tianxi Cai Journal: J Am Med Inform Assoc Date: 2019-11-01 Impact factor: 4.497
Authors: Jeffrey P Ferraro; Ye Ye; Per H Gesteland; Peter J Haug; Fuchiang Rich Tsui; Gregory F Cooper; Rudy Van Bree; Thomas Ginter; Andrew J Nowalk; Michael Wagner Journal: Appl Clin Inform Date: 2017-05-31 Impact factor: 2.342
Authors: Victor M Castro; Dmitriy Dligach; Sean Finan; Sheng Yu; Anil Can; Muhammad Abd-El-Barr; Vivian Gainer; Nancy A Shadick; Shawn Murphy; Tianxi Cai; Guergana Savova; Scott T Weiss; Rose Du Journal: Neurology Date: 2016-12-07 Impact factor: 9.910
Authors: Sheng Yu; Yumeng Ma; Jessica Gronsbell; Tianrun Cai; Ashwin N Ananthakrishnan; Vivian S Gainer; Susanne E Churchill; Peter Szolovits; Shawn N Murphy; Isaac S Kohane; Katherine P Liao; Tianxi Cai Journal: J Am Med Inform Assoc Date: 2018-01-01 Impact factor: 4.497
Authors: Thomas H McCoy; Victor M Castro; Kamber L Hart; Amelia M Pellegrini; Sheng Yu; Tianxi Cai; Roy H Perlis Journal: Biol Psychiatry Date: 2018-02-26 Impact factor: 13.382
Authors: Jessica Dennis; Aaron M Yengo-Kahn; Paul Kirby; Gary S Solomon; Nancy J Cox; Scott L Zuckerman Journal: J Neurotrauma Date: 2019-03-28 Impact factor: 5.269