Literature DB >> 29888051

From Sour Grapes to Low-Hanging Fruit: A Case Study Demonstrating a Practical Strategy for Natural Language Processing Portability.

Stephen B Johnson¹, Prakash Adekkanattu², Thomas R Campion^1,2, James Flory¹, Jyotishman Pathak¹, Olga V Patterson^3,4, Scott L DuVall^3,4, Vincent Major⁵, Yindalon Aphinyanaphongs⁵.

Abstract

Natural Language Processing (NLP) holds potential for patient care and clinical research, but a gap exists between promise and reality. While some studies have demonstrated portability of NLP systems across multiple sites, challenges remain. Strategies to mitigate these challenges can strive for complex NLP problems using advanced methods (hard-to-reach fruit), or focus on simple NLP problems using practical methods (low-hanging fruit). This paper investigates a practical strategy for NLP portability using extraction of left ventricular ejection fraction (LVEF) as a use case. We used a tool developed at the Department of Veterans Affair (VA) to extract the LVEF values from free-text echocardiograms in the MIMIC-III database. The approach showed an accuracy of 98.4%, sensitivity of 99.4%, a positive predictive value of 98.7%, and F-score of 99.0%. This experience, in which a simple NLP solution proved highly portable with excellent performance, illustrates the point that simple NLP applications may be easier to disseminate and adapt, and in the short term may prove more useful, than complex applications.

Entities: Chemical Disease Gene Species

Year: 2018 PMID： 29888051 PMCID： PMC5961788

Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc

Introduction

Natural Language Processing (NLP) holds tremendous potential for patient care and clinical research[1-4]. However, a recent review of the literature by Demner-Fushman and Elhadad suggests that NLP remains an “emerging technology”, with a significant gap between promise and reality[5]. The NLP community has engaged in numerous challenge tasks in recent years, which have been beneficial in improving technical methods and research collaboration. But, due to the artificial nature of tasks suitable for such competitions, these efforts have had limited impact on real-world problems[6]. Several studies have demonstrated success in portability of NLP technologies across institutions[7-10]. However, a recent paper by Carrell et al. argues that there remain serious challenges in adapting NLP systems across multiple sites, which include assembling clinical corpora, managing diverse document structures and handling idiosyncratic linguistic expressions[11]. Carrell et al. suggest a variety of mitigation strategies, such as heuristic record linkage, acquisition of local knowledge, active learning, and tailoring with machine learning[11]. In contrast, Demner-Fushman and Elhadad suggest sharing patterns for simple tasks and “more work on porting pipelines with easy domain adaptation”[5]. These two strategies may be broadly contrasted as seeking hard-to-reach fruit (which may turn out to be sour grapes for some institutions) or low-hanging fruit, respectively. This paper investigates which factors might allow one to pursue the latter approach as a practical strategy for NLP portability. Our project was motivated by the New York City Clinical Data Research Network (CDRN), a collaboration among six academic medical centers in the metropolitan area, seeking to collect and integrate clinical data to support patient-centered clinical research[12]. The CDRN needed an approach that would leverage existing NLP resources at specific sites, while enabling sharing of resources across sites. The first main consideration was to select a system architecture for NLP based on standards, which has become a crucial strategy to facilitate portability and scalability[13-15]. The second main consideration was to select a task with potential to be replicated across all sites. We chose left ventricular ejection fraction (LVEF), a primary diagnostic measurement of heart failure. LVEF is the ratio of the volume of blood ejected during systole to blood volume in the ventricle at the end of diastole. LVEF is typically measured by echocardiography and recorded in narrative text. A number of previous studies have shown success in extracting LVEF from clinical documents[16-19]. Based on these factors, we chose to work with a system architecture called Leo, which was developed by the Department of Veterans Affairs (VA) Informatics and Computing Infrastructure (VINCI) [20]. Leo is a set of libraries that facilitate rapid development and scalable deployment of NLP systems, and builds upon the Apache Unstructured Information Management Architecture Asynchronous Scaleout (UIMA AS)[21]. In particular, this study focuses on a specific instance of Leo named Ejection Fraction Extractor (EFEx)[22]. VINCI developed EFEx to extract LVEF values from clinical documents that originate at various centers within the VA[23]. These studies were conducted entirely on VA documents, which raises a question about generalizability outside the VA system. However, with over 1,700 points of care and thousands of clinical authors, the VA system provides an exceptional data source for system training. Therefore, we expected that EFEx would be a better candidate for portability than a tool developed using data from a single medical center. This report details the initiative that we undertook to install and configure EFEx at Weill Cornell Medicine, and to extract LVEF from echocardiogram reports available in the MIMIC-III database.

Methods

Data source

We obtained echocardiograms from the Medical Information Mart for Intensive Care III (MIMIC-III) database [24]. MIMIC is an openly available database developed by the MIT Lab for Computational Physiology. The latest version, MIMIC-III contains de-identified patient records for >40,000 critical care patients between 2001 and 2012. Researchers wishing to use the data must accept the data use agreement and provide evidence of completion of appropriate human subject research training. The MIT research team de-identified the data according to Health Insurance Portability and Accountability Act Privacy Rules, which included random date shifting, which preserves temporal relationships within a given patient but not across patients. We extracted 8707 echocardiogram reports from the NOTEEVENTS table by selecting CATEGORY field for ‘Echo’. The table was filtered to only the first echocardiogram report, in chronological order, for each unique hospital admission (coded in MIMIC by HADM_ID). In this study we restricted our data corpus to a single document type of echocardiograms (low-hanging fruit) and originated at an independent source, which in this case is Beth Israel Deaconess Medical Center. One consideration behind such a selection was to investigate the effectiveness of EFEx on documents originated at an independent source. This is important if we want to eventually deploy EFEx at other CDRN centers while maintaining same level of performance. MIMIC is an open source de-identified dataset not subjected to institutional review board approval. The performance of EFEx on echocardiograms originated within Weill Cornell Medicine is an ongoing study and will be the subject of a future report.

System description

Leo follows the model of UIMA AS with client and service components, along with an additional core library. The client defines inputs and outputs for processing and sends requests to the services. Setting up a client consists of selecting the required collection reader and listener, which could be a database or a local file system. The core contains tools that have been developed in conjunction with Leo to facilitate various NLP and annotation needs. The service component contains the server functionality for launching UIMA AS services. The service component also defines the type system and annotators as a pipeline architecture that implements all the logic necessary to extract a target information from unstructured documents. The basic architecture of Leo is shown in Figure 1, with the flow beginning at the reader.

Figure 1

Leo architecture with UIMA-AS as the core component.

Leo is built using the Java language and requires the Java runtime environment and the Apache package manager Maven, and can be installed on Windows, Linux, or Mac. We installed instances of EFEx running on Linux and Mac environments, and the setup procedure was essentially identical. The following steps were performed to create a fully functional EFEx instance. We installed Java SDK 8 on our machines and set up an environment variable JAVA_HOME pointing to the JDK bin location, and added this to the PATH environment variable. We installed Maven 3.3.9 and setup an environment variable MAVEN_HOME pointing to maven bin location, and added this to the PATH variable. We downloaded UIMA-AS (http://uima.apache.org/downloads.cgi) and extracted the content to a suitable folder. We installed UIMA version 2.6.0 (uima-as-2.6.0-source-release.zip). and followed the instructions to compile and package the UIMA-AS. We set the UIMA_HOME environmental variable UIMA_HOME pointing to UIMA-AS root folder and added the bin folder to PATH variable. The distribution package for EFEx was made available through a VA github repository[25]. Installation of EFEx mainly involved downloading and extracting the content to a folder location on the machine. As part of the configuration setup, we created a folder called amq-broker under the uima-as folder and provided write permission to this folder. This folder is required for the broker service to copy all its configuration settings. The entire installation and basic configuration was completed in one day at WCM. However, the overall installation time may vary depending on technical skills available at individual centers.

Reference standard

The reference standard was developed at NYU Langone Medical Center. At WCM, all values were further confirmed through manual review of the entire document collection. Two reviewers examined each document on Excel spreadsheet. They were given training based on previously defined guidelines. These guidelines included identifying all mentions of LVEF and the associated quantitative values. If there were differences between the two reviewers’ findings, a third reviewer serving as adjudicator resolved the discrepancy. The reviewers also confirmed all documents that did not have any mention of LVEF information. We identified two values, EFmin and EFmax, corresponding to the lowest and the highest values of LVEF for each document in the dataset. The reviewer identified numeric values and ranges of LVEF (e.g. 55, 50-70), as well as severity-based descriptors such as normal, mild, moderate, and severe. The majority of reports had either a numerical value or a range of values. In documents that contain multiple instances of LVEF, we employed the following logic for determining the reference values: A LVEF instance in the conclusion part, normally at the end of the document, takes precedence over one in the finding section. A LVEF mention in the postoperative section takes precedence over one in finding or conclusion sections. (The postoperative section always follows the conclusion section in the document.) In some reports, the LVEF value is expressed using a greater than or less than symbol (e.g. LVEF >55). In these cases the reviewer extracted the value ignoring the symbol. Some echo reports express uncertainty about the LVEF value using a question mark (e.g. LVEF? 55-70). In such cases, the reviewer extracted the value, provided that there was no other instance mentioned elsewhere in the document. In reports where there was no quantitative value for LVEF available, we assigned a numerical value or a range of values using other information. LVEF concept synonyms were identified, including ‘lvef’, ‘left ventricular’, ‘LV’, ‘ejection fraction’, and modifiers were defined, such as ‘depressed’, ‘impaired’, ‘systolic dysfunction’ etc. When a concept was preceded or followed by modifiers to the severity level, such as mildly depressed, moderately depressed, or severely depressed, a quantitative value was assigned. Table 1 shows examples of modifiers and the corresponding values assigned. Despite using this mapping scheme, there were still documents with no concept-value pair identified in the reference standard. In general, these documents did not mention LVEF, or it was not possible to assign any meaningful value from the available information.

Table 1.

Values assigned for concepts of ejection fraction with qualitative modifiers in developing the reference standard.

Modifiers	Values
Severely depressed	5-29
Moderately depressed	30-44
Mildly depressed	45-54
Grossly preserved	50-55
Normal	70

While developing the reference standard, the context as well as the overall content was taken into consideration in assigning a value or range of values to a concept. For example, there could be instances of LVEF expressed as quantitative values as well as qualitative descriptors, such as when the phrase ‘normal global systolic function’ was mentioned along with ‘severe regional left ventricular systolic function’ and ‘EF 20-25%’. In such cases, the numerical value took precedence over qualitative descriptors. Another example, when a document contained the phrases ‘moderately depressed LVEF’ and ‘(LVEF=30%)’ as well as ‘LVEF 70% previously, now 30%.’ In this case, the 30% is taken as the value for EFmin.

Extraction methodology

Patterson, et al. has described the logic for concept extraction employed in the present study in detail[2]. EFEx is a rule-based system that identifies the set of core concepts for LVEF using regular expressions, pattern matching, and filters. Because of the ambiguous nature of some of the concepts (such as ‘function’), the preceding text to each mention of the concept was used as a filter. Quantitative values were found using number patterns, but allowed for with or without modifiers such as ‘=,’ ‘(,’ ‘>,’ ‘%,’ ‘(<,’ and ranges of values. Figure 2 shows the overall logic that was implemented in finding the concept-value pairs of LVEF. Steps A through K are used to extract concept-value pairs, if there is one found in the document. For those cases when no concept-value pair is identified through steps A to K, functionality was added to the original EFEx to look for qualitative modifiers used to describe the LVEF concept. This extended logic was implemented through steps M through O and effectively simulates the mapping scheme adopted in creating the reference standard. We identified 100 reports that were previously shown to have no output when processed by EFEx and used these as a training set for developing the extended logic. The output from the training set was manually reviewed to adjust the regular expression patterns through an iterative process.

Figure 2

Extraction logic for LVEF implemented in EFEx system.

Data analysis

We analyzed the current results on MIMIC-III data in two ways. In the first case, we analyzed data using the extraction logic implemented in the original EFEx for LVEF. This instance that was ported from VA has extraction logic implemented only through steps A to K as described in Figure 2. We refer this version as Original EFEx. Upon analyzing results on MIMIC data, we observed that the Original EFEx missed a significant number of documents where EF concept is described in a qualitative manner without any numerical value assigned. So at WCM we further extended the extraction methodology by implementing additional logics to discover EF concept-value pair based on qualitative assessment through a mapping scheme. The extended logic implemented as steps M through O in Figure 2 improved the performance of EFEx significantly. The algorithm searched for both numeric values, ranges of LVEF (e.g. 55, 50-70) and severity-based descriptors based around the clinically relevant normal, mild, moderate, and severe labels. We refer this version as Extended EFEx. Performance measures were then calculated in both Original EFEx and Extended EFEx instances on the entire documents. The results of EFEx output were tabulated and compared against the reference standard. Each document was classified as one of four possible cases: true positive (document had an LVEF mention and EFEx identified the concept-value pair and matched with the value given in the reference standard); false positive (document had no LVEF mention as given by a null value in reference standard, but EFEx produced a non-null concept-value pair); true negative (document had no LVEF concept-value mention as given by a null value in the reference standard, and EFEx did not find any concept-value pair); and false negative (document had an LVEF concept-value mention as given by a non-null value in the reference standard, but EFEx did not identify a concept-value pair, or the value extracted did not match with the corresponding reference standard value). When there were multiple instances of concept-value pair extracted by EFEx, we used the following heuristic measures to select a given instance of LVEF in order to compare directly with the reference standard. We either selected the last one, normally in the conclusion part of the report (time usually moves forward in the report), or the lowest value (the disease typically worsens). The total outcomes of the four cases were then used to calculate various statistical performance measures. These included precision (positive predictive value), recall (sensitivity or true positive rate), specificity (true negative rate), accuracy (number of correct identifications by the EFEx system divided by the number of documents the system analyzed), and the F-score (the harmonic mean of recall and precision).

Results

There were 8707 documents for analysis. Using the Original EFEx, we classified each document as one of four cases, for the purpose of calculating performance measures: true positive (6568), true negative (1124), false positive (0), and false negative (1015). These values resulted an overall accuracy of 88.3% (95% CI 87.7% – 88.9%), sensitivity of 86.6% (95% CI 85.8% – 87.4%), specificity of 100 % (95% CI 99.6% – 100%), positive predictive value 100% (95% CI 99.9% – 100%), and an F-score of 92.8%. Percentage of severely and moderately depressed ejection fraction (LVEF < 45) cases is calculated to be 10.7%. Using the Extended EFEx, we classified documents as true positive (7541), true negative (1026), false positive (98), and false negative (42). These values resulted an overall accuracy of 98.4% (95% CI 97.8% – 98.8%), sensitivity of 99.4% (95% CI 99.2% – 99.6%), specificity of 91.3% (95% CI 89.4% – 92.8%), positive predictive value 98.7% (95% CI 98.4% – 99.0%), and an F-score of 99.0. We observed an increased percentage of severely and moderately depressed cases for ejection fraction (18.1%).

Discussion

This experience illustrates how ejection fraction is an excellent example of ‘low hanging fruit’: a simple potential application for NLP that is relatively easily portable to new clinical settings. One limitation of this study is that the task of identifying LVEF measurements is relatively simple, with low variability of expressions and values to extract. In addition, the study examined only one document type. However, the relative simplicity of the task does not mean it is not important: ready availability of this important quantitative parameter has important implications for research, quality improvement, and clinical care. The EFEx development team reported that the system achieved 98% positive predictive value and 93% sensitivity at the instance level across all medical centers across all VA[22]. Garvin, et al., has developed an NLP system based on the UIMA architecture for extracting LVEF values from echocardiograms that are generated at four centers within the VA[23]. They have reported for document-level classification of EF of <40% had a sensitivity of 98.41%, a specificity of 100%, a positive predictive value of 100%, and an F-score of 99.2%. Also system test results at a concept level it was reported a sensitivity of 88.9%, a positive predictive value of 95%, and an F-score of 91.9%. It should be noted that the discovery logic that was developed in that study is not the one implemented in the present EFEx system, although they both share some common features. The present results on the MIMIC-III dataset show a comparable overall performance when analyzed with EFEx without the qualitative concept mapping. The results on the Extended EFEx showed an improved performance matching the document level values reported above. Recently, Nath et al. has reported an NLP tool named EchoInfer for large-scale data extraction from echocardiography reports at a single medical center[18]. They have reported a recall 95-99% and a precision > 96% for LVEF. When compared to the performance of EchoInfer the result obtained from the EFEx system shows a slightly lower performance, when using EFEx without any concept-mapping scheme in its discovery logic on MIMIC data. However, with the concept-mapping scheme, the performance of EFEx improved significantly and the values are slightly better than the EchoInfer reported above. For the entire dataset, the Original EFEx classified 1015 documents as false negative. However, with Extended EFEx, we observed only 42 false negative cases. Some of the documents where Extended EFEx failed to identify a value for LVEF are one in which both left and right ventricle is mentioned together in one statement. Similarly, for documents in which the target concept value immediately followed by a different numeral (e.g. a list index number), our extraction logic failed to identify the correct value for EF. The adoption of the mapping scheme significantly improved the identification of severely and moderately depressed cases in the dataset. With Extended EFEx, we observed a 40% increase in the number of cases with LVEF < 45, which were confirmed by the manual review. This substantial increase further supports the effectiveness of the extended logic that was implemented, as these additional cases would not have been discovered using the original logic alone. On the flip side, implementation of the extended logic introduced several false positive cases for EF. While no false positive case was observed with Original EFEx, 98 false positive cases were observed on Extended EFEx. The mapping scheme we implemented does not assign values for cases such as mild to moderate depressed, moderate to severely depressed, borderline depressed, or more depressed. For documents with these statements, EFEx assigns incorrect values for EF. Similarly, a statement such as Preserved LVEF (effective forward LVEF may be depressed given the severity of valvular regurgitation) (HADM_ID = 182611) is subject to interpretation and no value is given in the reference standard. Our extended logic assigned a value of 45. Typical of most NLP systems, there is room for further improvement in the extraction logic as evident from some of the false positive and false negative cases observed with EFEx. Recent papers have identified a number of challenges facing NLP portability[5-11], such as assembling clinical corpora, managing diverse document structures and handling idiosyncratic linguistic expressions. An additional challenge arises when using standardized NLP architectures such as UIMA, especially when integrating multiple NLP modules[13-15]. A final challenge not identified by these papers involves leadership of dissemination project. In general, the vast amount of dissemination of informatics technology has been a push from a small number of innovators (“benchmark institutions”) to adopters, rather than a pull from the adopter[26]. The five challenges for NLP portability are summarized in Table 2, along with strategies for mitigating the challenges described in the cited literature. The strategies can be roughly partitioned into technologically advanced methods addressing complex NLP tasks (striving for the hard-to-reach fruit), and more practical methods addressing simpler NLP tasks (settling for the low-hanging fruit). Focusing on target concepts with low sensitivity to document location is found to be a good practical strategy for the portability of NLP tools. Our own experience showed that simple concepts where the associated values follow a general convention or prescribed format are good candidates for Leo. At WCM, our ongoing development effort resulted in other instances of Leo were we used this strategy effectively. We had achieved high performance in extracting PHQ-9 score from encounter notes. Similarly, we achieved high performance in extracting TNM stages, Gleason score and ICD-9/10 diagnosis codes from surgical pathology reports. In these cases, the precision and recall of Leo instances were sufficiently high enough, and we are currently in the process of making these data available in the i2b2 instance at WCM.

Table 2.

NLP portability challenges, and mitigation strategies that require advanced methods (hard-to-reach), and more practical methods (low-hanging).

Challenge	Strategy: Hard-to-Reach	Strategy: Low-Hanging
Assemble corpora with heterogeneous document types	Use heuristic linkage methods; develop document classifiers	Exploit metadata; focus on single document type
Navigate diverse report structures	Customize document segmentation algorithms; employ active learning	Select pattern with low sensitivity to document location
Analyze idiosyncratic linguistic expressions	Use machine learning to tailor complex patterns	Re-use or adapt simple patterns developed previously
Integrate multiple NLP modules	Employ large number of modules; adapt to meet architecture standards	Employ small number of modules; re-use use or adapt modules previously standardized
Lead the dissemination project	Acquire funding to support the innovator site; supply expertise in NLP methods	Draw on existing resources at the adopter site; use conventional software skills

Conclusion

We extracted LVEF information from echocardiogram reports from the MIMIC-III database using the EFEx NLP system. We compared the results to a reference standard developed manually by human reviewers. EFEx in its original version showed lower performance compared to the performance reported on VA documents that are different in document formats and content. However, when the extraction logic was modified to include a concept-value mapping scheme similar to the mapping scheme used in developing the reference standard, EFEx had an accuracy of 98.4%, sensitivity of 99.4%, a positive predictive value of 98.7%, and an F-score of 99.0%. These values match reasonably well with that reported earlier on VA generated echocardiograms. The extended extraction logic also improved the discovery of cases having severely or moderately depressed LVEF by 40%. The current study on the LVEF extraction from the MIMIC dataset suggests that the EFEx performance varies depending on documents that are originated at different clinical settings. The project described in this paper pursues a practical strategy to pursue a relatively simple NLP task (low-hanging fruit). We exploited database metadata to focus on single document type (cardiology reports). We chose a pattern with low sensitivity to document location (we used the last occurrence of LVEF). We adapted simple rule based extraction logic, and a specific instance (EFEx) of a NLP system (Leo) previously developed by the VA. The adopter (WCM) led the dissemination project, drawing on existing resources, and employing conventional software skills. This case study provides evidence that an NLP system can be ported successfully from one institution to another, enable customization to a new data source, and achieve comparable performance. The identification of practical strategies for NLP portability has paved the way for sharing NLP tools among the multiple institutions in the NYC CDRN, and may provide useful guidance for other institutions interested in pursuing a similar approach.

22 in total

1. Automated extraction of ejection fraction for quality measurement using regular expressions in Unstructured Information Management Architecture (UIMA) for heart failure.

Authors: Jennifer H Garvin; Scott L DuVall; Brett R South; Bruce E Bray; Daniel Bolton; Julia Heavirland; Steve Pickard; Paul Heidenreich; Shuying Shen; Charlene Weir; Matthew Samore; Mary K Goldstein
Journal: J Am Med Inform Assoc Date: 2012-03-21 Impact factor: 4.497

2. Scaling-up NLP Pipelines to Process Large Corpora of Clinical Notes.

Authors: G Divita; M Carter; A Redd; Q Zeng; K Gupta; B Trautner; M Samore; A Gundlapalli
Journal: Methods Inf Med Date: 2015-11-04 Impact factor: 2.176

3. Concept-value pair extraction from semi-structured clinical narrative: a case study using echocardiogram reports.

Authors: Jeanhee Chung; Shawn Murphy
Journal: AMIA Annu Symp Proc Date: 2005

Review 4. Community challenges in biomedical text mining over 10 years: success, failure and the future.

Authors: Chung-Chi Huang; Zhiyong Lu
Journal: Brief Bioinform Date: 2015-05-01 Impact factor: 11.622

5. Portability of an algorithm to identify rheumatoid arthritis in electronic health records.

Authors: Robert J Carroll; Will K Thompson; Anne E Eyler; Arthur M Mandelin; Tianxi Cai; Raquel M Zink; Jennifer A Pacheco; Chad S Boomershine; Thomas A Lasko; Hua Xu; Elizabeth W Karlson; Raul G Perez; Vivian S Gainer; Shawn N Murphy; Eric M Ruderman; Richard M Pope; Robert M Plenge; Abel Ngo Kho; Katherine P Liao; Joshua C Denny
Journal: J Am Med Inform Assoc Date: 2012-02-28 Impact factor: 4.497

Review 6. Managing free text for secondary use of health data.

Authors: N Griffon; J Charlet; S J Darmoni
Journal: Yearb Med Inform Date: 2014-08-15

Review 7. Natural language processing in biomedicine: a unified system architecture overview.

Authors: Son Doan; Mike Conway; Tu Minh Phuong; Lucila Ohno-Machado
Journal: Methods Mol Biol Date: 2014

Review 8. Natural Language Processing in Oncology: A Review.

Authors: Wen-Wai Yim; Meliha Yetisgen; William P Harris; Sharon W Kwan
Journal: JAMA Oncol Date: 2016-06-01 Impact factor: 31.777

9. Extraction of left ventricular ejection fraction information from various types of clinical reports.

Authors: Youngjun Kim; Jennifer H Garvin; Mary K Goldstein; Tammy S Hwang; Andrew Redd; Dan Bolton; Paul A Heidenreich; Stéphane M Meystre
Journal: J Biomed Inform Date: 2017-02-02 Impact factor: 6.317

10. Challenges in adapting existing clinical natural language processing systems to multiple, diverse health care settings.

Authors: David S Carrell; Robert E Schoen; Daniel A Leffler; Michele Morris; Sherri Rose; Andrew Baer; Seth D Crockett; Rebecca A Gourevitch; Katie M Dean; Ateev Mehrotra
Journal: J Am Med Inform Assoc Date: 2017-09-01 Impact factor: 4.497

7 in total

1. Underserved populations with missing race ethnicity data differ significantly from those with structured race/ethnicity documentation.

Authors: Evan T Sholle; Laura C Pinheiro; Prakash Adekkanattu; Marcos A Davila; Stephen B Johnson; Jyotishman Pathak; Sanjai Sinha; Cassidie Li; Stasi A Lubansky; Monika M Safford; Thomas R Campion
Journal: J Am Med Inform Assoc Date: 2019-08-01 Impact factor: 4.497

2. Evaluating the Portability of an NLP System for Processing Echocardiograms: A Retrospective, Multi-site Observational Study.

Authors: Prakash Adekkanattu; Guoqian Jiang; Yuan Luo; Paul R Kingsbury; Zhenxing Xu; Luke V Rasmussen; Jennifer A Pacheco; Richard C Kiefer; Daniel J Stone; Pascal S Brandt; Liang Yao; Yizhen Zhong; Yu Deng; Fei Wang; Jessica S Ancker; Thomas R Campion; Jyotishman Pathak
Journal: AMIA Annu Symp Proc Date: 2020-03-04

3. Ascertaining Depression Severity by Extracting Patient Health Questionnaire-9 (PHQ-9) Scores from Clinical Notes.

Authors: Prakash Adekkanattu; Evan T Sholle; Joseph DeFerio; Jyotishman Pathak; Stephen B Johnson; Thomas R Campion
Journal: AMIA Annu Symp Proc Date: 2018-12-05

Review 4. Identifying Patients With Hypoglycemia Using Natural Language Processing: Systematic Literature Review.

Authors: Yaguang Zheng; Victoria Vaughan Dickson; Saul Blecker; Jason M Ng; Brynne Campbell Rice; Gail D'Eramo Melkus; Liat Shenkar; Marie Claire R Mortejo; Stephen B Johnson
Journal: JMIR Diabetes Date: 2022-05-16

5. An architecture for research computing in health to support clinical and translational investigators with electronic patient data.

Authors: Thomas R Campion; Evan T Sholle; Jyotishman Pathak; Stephen B Johnson; John P Leonard; Curtis L Cole
Journal: J Am Med Inform Assoc Date: 2022-03-15 Impact factor: 4.497

6. Performance of Electronic Health Record Diagnosis Codes for Ambulatory Heart Failure Encounters.

Authors: Parag Goyal; Budhaditya Bose; Ruth Masterson Creber; Udhay Krishnan; Mei Yang; Joanne Brady; Jyotishman Pathak
Journal: J Card Fail Date: 2020-08-02 Impact factor: 5.712

7. Using electronic health records for population health sciences: a case study to evaluate the associations between changes in left ventricular ejection fraction and the built environment.

Authors: Yiye Zhang; Mohammad Tayarani; Subhi J Al'Aref; Ashley N Beecy; Yifan Liu; Evan Sholle; Arindam RoyChoudhury; Kelly M Axsom; Huaizhu Oliver Gao; Jyotishman Pathak; Jessica S Ancker
Journal: JAMIA Open Date: 2020-10-28

7 in total