Sophia Wang1, Benjamin Tseng2, Tina Hernandez-Boussard3. 1. Byers Eye Institute, Department of Ophthalmology, Stanford University, 2370 Watson Court, Palo Alto, CA, 94303, United States. Electronic address: sywang@stanford.edu. 2. Byers Eye Institute, Department of Ophthalmology, Stanford University, 2370 Watson Court, Palo Alto, CA, 94303, United States. Electronic address: bentseng@stanford.edu. 3. Center for Biomedical Informatics Research, School of Medicine, Stanford University, 1265 Welch Road, Stanford, CA, 94305, United States. Electronic address: boussard@stanford.edu.
Abstract
OBJECTIVE: To develop and evaluate novel word embeddings (WEs) specific to ophthalmology, using text corpora from published literature and electronic health records (EHR). MATERIALS AND METHODS: We trained ophthalmology-specific WEs using 121,740 PubMed abstracts and 89,282 EHR notes using word2vec continuous bag-of-words architecture. PubMed and EHR WEs were compared to general domain GloVe WEs and general biomedical domain BioWordVec embeddings using a novel ophthalmology-domain-specific 200-question analogy test and prediction of prognosis in 5547 low vision patients using EHR notes as inputs to a deep learning model. RESULTS: We found that many words representing important ophthalmic concepts in the EHR were missing from the general domain GloVe vocabulary, but covered in the ophthalmology abstract corpus. On ophthalmology analogy testing, PubMed WEs scored 95.0 %, outperforming EHR (86.0 %) and GloVe (91.0 %) but less than BioWordVec (99.5 %). On predicting low vision prognosis, PubMed and EHR WEs resulted in similar AUROC (0.830; 0.826), outperforming GloVe (0.778) and BioWordVec (0.784). CONCLUSION: We found that using ophthalmology domain-specific WEs improved performance in ophthalmology-related clinical prediction compared to general WEs. Deep learning models using clinical notes as inputs can predict the prognosis of visually impaired patients. This work provides a framework to improve predictive models using domain-specific WEs.
OBJECTIVE: To develop and evaluate novel word embeddings (WEs) specific to ophthalmology, using text corpora from published literature and electronic health records (EHR). MATERIALS AND METHODS: We trained ophthalmology-specific WEs using 121,740 PubMed abstracts and 89,282 EHR notes using word2vec continuous bag-of-words architecture. PubMed and EHR WEs were compared to general domain GloVe WEs and general biomedical domain BioWordVec embeddings using a novel ophthalmology-domain-specific 200-question analogy test and prediction of prognosis in 5547 low vision patients using EHR notes as inputs to a deep learning model. RESULTS: We found that many words representing important ophthalmic concepts in the EHR were missing from the general domain GloVe vocabulary, but covered in the ophthalmology abstract corpus. On ophthalmology analogy testing, PubMed WEs scored 95.0 %, outperforming EHR (86.0 %) and GloVe (91.0 %) but less than BioWordVec (99.5 %). On predicting low vision prognosis, PubMed and EHR WEs resulted in similar AUROC (0.830; 0.826), outperforming GloVe (0.778) and BioWordVec (0.784). CONCLUSION: We found that using ophthalmology domain-specific WEs improved performance in ophthalmology-related clinical prediction compared to general WEs. Deep learning models using clinical notes as inputs can predict the prognosis of visually impaired patients. This work provides a framework to improve predictive models using domain-specific WEs.
Authors: M Austin Coker; Carrie E Huisingh; Gerald McGwin; Russell W Read; Mark W Swanson; Laura E Dreer; Dawn K DeCarlo; Lindsay Gregg; Cynthia Owsley Journal: JAMA Ophthalmol Date: 2018-04-01 Impact factor: 7.389
Authors: Joanne C Wen; Cecilia S Lee; Pearse A Keane; Sa Xiao; Ariel S Rokem; Philip P Chen; Yue Wu; Aaron Y Lee Journal: PLoS One Date: 2019-04-05 Impact factor: 3.240
Authors: Imon Banerjee; Michael Francis Gensheimer; Douglas J Wood; Solomon Henry; Sonya Aggarwal; Daniel T Chang; Daniel L Rubin Journal: Sci Rep Date: 2018-07-03 Impact factor: 4.379