INTRODUCTION: National initiatives to develop quality metrics emphasize the need to include patient-centered outcomes. Patient-centered outcomes are complex, require documentation of patient communications, and have not been routinely collected by healthcare providers. The widespread implementation of electronic medical records (EHR) offers opportunities to assess patient-centered outcomes within the routine healthcare delivery system. The objective of this study was to test the feasibility and accuracy of identifying patient centered outcomes within the EHR. METHODS: Data from patients with localized prostate cancer undergoing prostatectomy were used to develop and test algorithms to accurately identify patient-centered outcomes in post-operative EHRs - we used urinary incontinence as the use case. Standard data mining techniques were used to extract and annotate free text and structured data to assess urinary incontinence recorded within the EHRs. RESULTS: A total 5,349 prostate cancer patients were identified in our EHR-system between 1998-2013. Among these EHRs, 30.3% had a text mention of urinary incontinence within 90 days post-operative compared to less than 1.0% with a structured data field for urinary incontinence (i.e. ICD-9 code). Our workflow had good precision and recall for urinary incontinence (positive predictive value: 0.73 and sensitivity: 0.84). DISCUSSION: Our data indicate that important patient-centered outcomes, such as urinary incontinence, are being captured in EHRs as free text and highlight the long-standing importance of accurate clinician documentation. Standard data mining algorithms can accurately and efficiently identify these outcomes in existing EHRs; the complete assessment of these outcomes is essential to move practice into the patient-centered realm of healthcare.
INTRODUCTION: National initiatives to develop quality metrics emphasize the need to include patient-centered outcomes. Patient-centered outcomes are complex, require documentation of patient communications, and have not been routinely collected by healthcare providers. The widespread implementation of electronic medical records (EHR) offers opportunities to assess patient-centered outcomes within the routine healthcare delivery system. The objective of this study was to test the feasibility and accuracy of identifying patient centered outcomes within the EHR. METHODS: Data from patients with localized prostate cancer undergoing prostatectomy were used to develop and test algorithms to accurately identify patient-centered outcomes in post-operative EHRs - we used urinary incontinence as the use case. Standard data mining techniques were used to extract and annotate free text and structured data to assess urinary incontinence recorded within the EHRs. RESULTS: A total 5,349 prostate cancerpatients were identified in our EHR-system between 1998-2013. Among these EHRs, 30.3% had a text mention of urinary incontinence within 90 days post-operative compared to less than 1.0% with a structured data field for urinary incontinence (i.e. ICD-9 code). Our workflow had good precision and recall for urinary incontinence (positive predictive value: 0.73 and sensitivity: 0.84). DISCUSSION: Our data indicate that important patient-centered outcomes, such as urinary incontinence, are being captured in EHRs as free text and highlight the long-standing importance of accurate clinician documentation. Standard data mining algorithms can accurately and efficiently identify these outcomes in existing EHRs; the complete assessment of these outcomes is essential to move practice into the patient-centered realm of healthcare.
Entities:
Keywords:
Electronic health records; data mining; health services research; patient-centered care; quality improvement
Prostate cancer is the most common malignancy in men.1 Although survival rates for prostate cancer treatment are excellent, patients may acquire treatment-related side effects, many which can only be reported by the patient (e.g., urinary incontinence, erectile dysfunction, or bowel dysfunction).2–4 Reported rates of such side effects vary depending on the population studied or treatment characteristics.5–10 The majority of research on these patient-centered outcomes stems from high-volume academic centers, which may undermine its generalizability to other settings.11,12 Given the current state of prostate cancer research, both patients and clinicians have limited evidence to guide their treatment choices;13,14 accurate and efficient measurement of outcomes other than mortality are needed to help patients make informed decisions regarding their treatment pathway.The Patient Protection and Affordable Care Act (ACA) aims to improve the quality and efficacy of health care delivery in the United States.15 Many sections of the ACA rely on accurate quality measurement (e.g., value-based payment modifiers)16 and efficient data retrieval (e.g., accountable care organizations and their information exchange).17 Furthermore, many sections of the ACA include patient-centered initiatives, which promote the use of patient-centered outcomes in clinical decision-making.18 Under the health care reform, accurate quality measurement is essential and should be patient centered.Patient-centered care reflects a patient’s overall health care experience and assesses the net effects of disease and treatment (e.g., disease-related quality of life, urinary incontinence, and overall health status) rather than physiological endpoints (e.g., laboratory values and disease-specific survival).19 Patient-centered endpoints are complex, require documentation of patient communications, and have not been routinely collected by health care providers. Patient-centered outcomes are not routinely captured as structured or coded data and therefore do not exist in administrative billing or claims data.20 Therefore, current patient-centered outcome reports must rely on size-limiting patient surveys (which contain ascertainment bias), prospective studies (which are not readily available and also contain ascertainment bias), or manual chart reviews (which are time limiting).The purpose of this study was to test the feasibility of using data mining algorithms to identify patient-centered outcomes in routinely collected electronic health records (EHRs); we use postprostatectomy urinary incontinence as a use case. Our methods apply techniques from the fields of data mining and information extraction. They are distinguished from previous studies that combine structured and unstructured EHR data by their focus on patient-centered outcome detection.
Methods
To identify patient-centered outcomes in EHRs, a robust workflow for deriving these data from routinely used EHRs is essential. Many diseases, such as prostate cancer, have important patient-centered outcomes that are not reliably recorded as coded data. We have to extract these data from the free text existing in the EHRs (e.g., clinicians’ reports, narrative text) using data mining algorithms, such as Natural Language Processing (NLP). NLP techniques automatically identify structured text or “knowledge” from free text using controlled vocabularies (e.g., ontologies or user-developed dictionaries) and grammatical rules.21,22 Often patterns and labels within the narrative text are queried to identify common phrases, such as regular expressions for high blood pressure.23 These data mining algorithms are becoming common to identify diseases and cohorts of patients as health care moves to the digital age.24,25 We have developed such a system to identify clinicians’ reporting of postoperative urinary incontinence in patients who were diagnosed with localized prostate cancer and who underwent prostatectomy. The information on the patient-centered outcome were derived from the coded data (e.g., ICD 9 codes) as well as the narrative text portions of EHRs—including clinical progress notes, referral notes, procedure reports, and postoperative reports from patients receiving care at the academic center.
Data Set and Study Population
We obtained data from a large, tertiary academic medical center that provides inpatient-, outpatient-, and primary care. During the time of our analysis, the center used the Epic (Epic Systems, Verona Wisconsin) EHR system. The access to de identified EHR data was obtained through an innovative research data warehouse that facilitates research.26 This translational research platform allows the capture of both structured data (e.g., ICD-9-CM codes, laboratory values, etc.) as well as unstructured data (e.g., clinicians’ narrative text, preoperative notes, etc.) on all patients receiving care at the institute.We identified patients in our research platform with localized prostate cancer based on ICD-9-CM code 185. (Figure 1). Patients were categorized into prostatectomy surgical groups according to ICD 9 procedure codes: open prostatectomy, ICD 9 60.5 and CPT 55845; robotic prostatectomy, ICD 9 60.5 plus 17.42 and CPT 55866; laparoscopic prostatectomy, 60.5 plus 54.21; and other prostatectomies, which included CPT codes that were not distinguishable between robotic and laparoscopic procedures, e.g., CPT 55840. In our data mining analysis, we exclude patients without a clinical note and without a follow-up visit within 90 days postoperatively because they have no text notes to process for postoperative urinary incontinence.
Figure 1.
Cohort Selection Flowchart from Electronic Health Records
Date Mining Workflow
Our data mining workflow used de-identified data from the institute’s translational research data warehouse.26 Urinary incontinence was identified using both structured data (ICD 9 CM: 788.30) and unstructured free text clinical notes (e.g., “urinary incontinence” or “urinary leakage”). To analyze free text, we used the NCBO Annotator to process our clinical notes.27 The NCBO Annotator is a minimalist system that relies on a large dictionary of terms, their mappings to Unified Medical Language System (UMLS) concepts, and the NegEx negation detection system (a part of the ConText system)28 to find mentions of biomedical concepts in clinical text and establish their negation status.27,29We customized our approach for identifying cases of urinary incontinence documented in free text using an approach that has been previously applied to develop task-specific extractors.30 With the aim of improving sensitivity, we enhanced the annotator’s terminology to include additional terms relevant to urinary incontinence (e.g., “wears adult diapers”). In addition, we extended the basic set of rules provided by NegEx to consider additional contextual information such as the following: hypothetical terms, e.g., “at risk for” (urinary incontinence); historical terms, e.g., “past history of” (urinary incontinence); and discussion terms, e.g., “discussed complications such as” (urinary incontinence).29 After our workflow rules were applied, we defined “positive urinary incontinence mentions” as “those indicating documentation of a positive urinary incontinence case at the time of documentation” and all other types (e.g., negative, historical, or hypothetical mentions) of urinary incontinence as negative.Our classifications were based on clinical information extracted from patient progress notes, consultations, referral reports, and postoperative notes; and on other types of unstructured free-text clinical notes available in the EHRs. We did not attempt to quantify the level of incontinence, we only identify if a patients’ clinician reported any level of urinary incontinence or if they used an ICD 9 code for urinary incontinence. Our entire data-mining framework, which detects patient-centered outcomes from both structured and unstructured EHR data, can be executed on 1.8 million patient records (approximately 21 million clinical notes) in less than 24 hours on standard server hardware.We performed a manual chart review on a subset of records to test the accuracy of the data mining workflow. For this review, 200 randomly selected entries were selected for review. A single reviewer was provided with a snippet of text surrounding the term of interest, urinary incontinence. The reviewer was blinded—the positive or negative determination of urinary incontinence from the workflow was not revealed. The reviewer marked each instance as positive or negative for urinary incontinence. Each instance corresponded to a single patient encounter. These results were used to calculate the positive predicted value and sensitivity of the workflow, standard performance tests for data mining algorithms.The human subjects research review board of the participating institution approved this study.
Results
From 1998 to 2007, the inclusion of text notes in our EHR increased steadily. In 2008 our EPIC system was fully installed. Approximately half of all patient encounters contained some clinical note between 2008 and 2013. Patient demographics are presented in Table 1. Among the full cohort, 1485 patients had a text note in their EHR records.
Table 1.
Characteristics of Patients Receiving Prostatectomy for Localized Prostate Cancer, 1998–2013
CHARACTERISTIC
TOTAL COHORT n = 5,353
COHORT WITH EHR NOTES n = 1,485
Age, mean(SD)
65.49 (0.14)
64.83 (0.26)
RACE, n(%)
White
3,910 (73.04%)
995 (67.00%)
Black
130 (2.43%)
51 (3.43%)
Asian
362 (6.76%)
118 (7.95%)
Other
951 (17.77%)
321 (21.62%)
Hispanic, n(%)
216 (4.04%)
79 (5.32%)
SURGERY, n(%)
Open
2,345 (43.81%)
564 (37.98)
Robotic
385 (7.19%)
124 (8.35%)
Laparoscopic
414 (7.73%)
303 (20.40%)
Other
2,209 (41.27)
493 (33.27%)
The comparison of urinary incontinence recorded in patients’ records is presented in Table 2. Of the 5,349 prostate cancerpatients who were identified in our EHR, only 4 patient encounters had an ICD 9 CM code for urinary incontinence, yet 450 patients had urinary incontinence documented in the free text note. Furthermore, in the free text note, 1,035 patients had documentation saying that the patient did not currently have urinary incontinence. For instances of urinary incontinence text mentions, our workflow had the following accuracy scores: positive predictive value 0.73 and sensitivity 0.84.
Table 2.
Postoperative Assessment of Urinary Incontinence Stratified by Structured Versus Unstructured Data Within the EHR
TYPE OF EHR INFORMATION
POSITIVE DOCUMENTATION OF URINARY INCONTINENCE
NEGATIVE DOCUMENTATION OF URINARY INCONTINENCE*
ABSENCE OF DOCUMENTATION OF URINARY INCONTINENCE
Text
450
1035
3868
ICD-9
4
n/a
5349
Note:
Negative Documentation refers to patients reporting that they are not suffering from urinary incontinence
We display a number of patients with a text mention of urinary incontinence by postoperative follow-up in days (Figure 2). The number of patients seen postoperatively with a recording of a urinary incontinence assessment was 130, 177, and 417 for 30-, 60-, and 90 days, respectively. In this graph, patients may have multiple visits. As urinary incontinence can improve postoperatively, it is important to show that this patient-centered outcome is being assessed and documented beyond the first 30-day postoperative visit.
Figure 2.
Number of patients with a Mention of
Limitations
Note that we only report on what clinicians are documenting in the EHRs. These patient-centered outcomes reported by clinicians may vary from those reported by the patient. However, our data indicate that patient-centered outcomes, such as urinary incontinence, are documented in clinicians’ text significantly more than they are recorded as coded data. Future studies should focus on the agreement between patient-reported and clinician-reported outcomes.
Discussion
Quality measurement is a means to monitor health care delivery and set benchmarks for timely, evidence-based care. With a disease such as prostate cancer, where survival is excellent, patient-centered outcomes might be among the best quality measures of health care delivered. In this study we found that urinary incontinence, an important patient-centered outcome following prostate cancer treatment, was reported almost exclusively in the free text of EHRs and was rarely coded as an ICD 9 diagnosis code. Here we tested the feasibility of efficiently and accurately extracting this patient-centered outcome from EHRs using standard data-mining techniques. This report provides evidence that patient-centered outcomes are recorded in EHRs and that these data can be efficiently and accurately extracted.The widespread implementation of EHRs offers opportunities to support patient-centered care and quality improvement efforts.31 EHRs host a comprehensive set of care processes and outcomes, including outcomes other than physiological endpoints. EHRs capture clinicians’ narrative text, images, and progress notes together with structured data elements. Over 80 percent of EHR data are captured as unstructured text, and here resides the rich, narrative text.32,33 The narrative text may contain information on patients’ preferences, concerns, and often on patient-centered outcomes. However, the narrative text is stored as unstructured data (free text) that is difficult to assess using traditional measurement methods, which focus on structured data such as ICD-9-CM codes. Recent studies have used structured data within EHRs for quality improvement efforts34–36 and others have applied text-processing methods to sections of EHR (clinical notes, discharge notes, and pathology reports) for quality assessment.22,37–39 We extend these methods to include patient-centered outcomes.Mining existing structured and unstructured EHR data for patient-centered outcomes has several immediate benefits and efficiencies. First, we have shown that longitudinal narrative data for these patient-centered outcomes are in the EHR. These data exist mainly in the narrative text and not in the structured data, so EHR studies must look beyond coded data. Indeed, our research found that urinary incontinence, one of the most reported outcomes with known effects on health-related quality of life following prostate cancer treatment,40 was almost exclusively reported in EHR free text. Second, studies derived from EHR data do not inherently contain ascertainment bias, as do many survey-based and prospective studies.41 EHR data exist across populations, care settings, and socioeconomic status, thus eliminating many of these known biases. Third, data-mining algorithms now allow for efficient processing, and for retrieval of data. It is clear that this is significantly advantageous over manual chart review, as previously noted.42Extracting and analyzing patient-centered outcome data in a precise and timely manner is the first step in creating treatment pathways that reflect the patients’ individual risk values. Using prostatectomy as an example, if robotic surgery has a 20 percent relative risk of urinary incontinence and a 30 percent relative risk of erectile dysfunction and open prostatectomy has a 30 percent relative risk of urinary incontinence and a 20 percent relative risk of erectile dysfunction, patients can make informed treatment decisions based on their personal values of these different risks, which is a highlight of patient-centered care.43 The patient’s perspective of risk can be incorporated into the treatment pathway only if we have valid and accurate rates of these important patient-centered outcomes—for which evidence is currently limited.13To move to a value-based care system, we must expand our measures of quality beyond simple coded data and include a comprehensive set of health care outcomes. As prostate cancer has excellent survival, patient-centered outcomes should be reflected in the quality measures used to assess the disease treatment. Our data indicate that important patient-centered outcomes, such as urinary incontinence, are being captured in EHRs as free text. This highlights the long-standing importance of accurate clinician documentation.Development of generalizable benchmarks and accurate and complete assessment of these outcomes are essential to move practice into the patient-centered realm of health care.
Authors: Henk Harkema; Wendy W Chapman; Melissa Saul; Evan S Dellon; Robert E Schoen; Ateev Mehrotra Journal: J Am Med Inform Assoc Date: 2011-09-21 Impact factor: 4.497
Authors: Ian Thompson; James Brantley Thrasher; Gunnar Aus; Arthur L Burnett; Edith D Canby-Hagino; Michael S Cookson; Anthony V D'Amico; Roger R Dmochowski; David T Eton; Jeffrey D Forman; S Larry Goldenberg; Javier Hernandez; Celestia S Higano; Stephen R Kraus; Judd W Moul; Catherine M Tangen Journal: J Urol Date: 2007-06 Impact factor: 7.450
Authors: Suzanne Tamang; Manali I Patel; Douglas W Blayney; Julie Kuznetsov; Samuel G Finlayson; Yohan Vetteth; Nigam Shah Journal: J Oncol Pract Date: 2015-05 Impact factor: 3.840
Authors: Matthew J Resnick; Tatsuki Koyama; Kang-Hsien Fan; Peter C Albertsen; Michael Goodman; Ann S Hamilton; Richard M Hoffman; Arnold L Potosky; Janet L Stanford; Antoinette M Stroup; R Lawrence Van Horn; David F Penson Journal: N Engl J Med Date: 2013-01-31 Impact factor: 91.245
Authors: Nigam H Shah; Nipun Bhatia; Clement Jonquet; Daniel Rubin; Annie P Chiang; Mark A Musen Journal: BMC Bioinformatics Date: 2009-09-17 Impact factor: 3.169
Authors: Tina Hernandez-Boussard; Panagiotis D Kourdis; Tina Seto; Michelle Ferrari; Douglas W Blayney; Daniel Rubin; James D Brooks Journal: AMIA Annu Symp Proc Date: 2018-04-16
Authors: Davide Gori; Rajendra Dulal; Douglas W Blayney; James D Brooks; Maria P Fantini; Kathryn M McDonald; Tina Hernandez-Boussard Journal: Jt Comm J Qual Patient Saf Date: 2018-09-18
Authors: Guergana K Savova; Ioana Danciu; Folami Alamudun; Timothy Miller; Chen Lin; Danielle S Bitterman; Georgia Tourassi; Jeremy L Warner Journal: Cancer Res Date: 2019-08-08 Impact factor: 12.701
Authors: Lili Chan; Kelly Beers; Amy A Yau; Kinsuk Chauhan; Áine Duffy; Kumardeep Chaudhary; Neha Debnath; Aparna Saha; Pattharawin Pattharanitima; Judy Cho; Peter Kotanko; Alex Federman; Steven G Coca; Tielman Van Vleck; Girish N Nadkarni Journal: Kidney Int Date: 2019-11-09 Impact factor: 10.612
Authors: David A Hanauer; Jill S Barnholtz-Sloan; Mark F Beno; Guilherme Del Fiol; Eric B Durbin; Oksana Gologorskaya; Daniel Harris; Brett Harnett; Kensaku Kawamoto; Benjamin May; Eric Meeks; Emily Pfaff; Janie Weiss; Kai Zheng Journal: JCO Clin Cancer Inform Date: 2020-05
Authors: Davide Gori; Imon Banerjee; Benjamin I Chung; Michelle Ferrari; Paola Rucci; Douglas W Blayney; James D Brooks; Tina Hernandez-Boussard Journal: EGEMS (Wash DC) Date: 2019-08-20