Literature DB >> 25882031

Automated methods for the summarization of electronic health records.

Abstract

OBJECTIVES: This review examines work on automated summarization of electronic health record (EHR) data and in particular, individual patient record summarization. We organize the published research and highlight methodological challenges in the area of EHR summarization implementation. TARGET AUDIENCE: The target audience for this review includes researchers, designers, and informaticians who are concerned about the problem of information overload in the clinical setting as well as both users and developers of clinical summarization systems. SCOPE: Automated summarization has been a long-studied subject in the fields of natural language processing and human-computer interaction, but the translation of summarization and visualization methods to the complexity of the clinical workflow is slow moving. We assess work in aggregating and visualizing patient information with a particular focus on methods for detecting and removing redundancy, describing temporality, determining salience, accounting for missing data, and taking advantage of encoded clinical knowledge. We identify and discuss open challenges critical to the implementation and use of robust EHR summarization systems.

Entities: CellLine Chemical Disease Species

Keywords: Clinical summarization; electronic health records; missing data; natural language processing; semantic similarity; temporality

Mesh：

Year: 2015 PMID： 25882031 PMCID： PMC4986665 DOI： 10.1093/jamia/ocv032

Source DB: PubMed Journal: J Am Med Inform Assoc ISSN： 1067-5027 Impact factor: 4.497

INTRODUCTION

The increased adoption of electronic health records (EHRs) has led to an unprecedented amount of patient health information stored in electronic format. However, the availability of overwhelmingly large records has also raised concerns of information overload, with potential negative consequences on clinical work, such as errors of omission, delays, and overall patient safety. Current EHR systems often do not present this tremendous amount of patient data in a way that supports clinical workflow or cognitive reasoning. It is therefore imperative for patient care to automatically comb through the raw data points present in the records and detect timely and relevant information. Alarmingly, as the most chronically ill patients often have the largest datasets, their records are the most difficult to coherently present. As an example, for a prevalent chronic condition in our institution, patients with chronic kidney disease have 338 notes on average in their record (from all clinical settings) gathered across an average of 14 years, with several patients’ records containing over 4000 notes. It is clear that during a regular medical visit, no practitioner can read hundreds of clinical notes. Fortunately, electronic storage of this health information provides an opportunity for EHR systems to “aid cognition through aggregation, trending, contextual relevance, minimizing superfluous data.” Currently available commercial EHR systems, however, inadequately address this need, sometimes providing organization of data but lacking in information synthesis. Some vendor EHR dashboards display problem lists that aggregate billing codes but these are low in actionable knowledge., Given this unmet and well-recognized need for comprehensive EHR summarization,, many research groups have designed and evaluated clinical data summarizers. In this review, we sample summarization applications to highlight different features including seminal work, different evaluation strategies, and various input/output data. We also examine the current work and future directions for six challenges of EHR summarization: information redundancy, temporality, missing data, salience detection, rules and heuristics, and deployment of summarization tools.

GENERAL APPROACHES TO SUMMARIZATION

There are multiple theoretical frameworks for summarization in the clinical domain as well as for textual summarization in the general domain., In the broader field of summarization, there has been a lot of work in automated text summarization, specifically within the genres of news stories and scientific articles (see for an in-depth review). Clinical summarization, “the act of collecting, distilling, and synthesizing patient information for the purpose of facilitating any of a wide range of clinical tasks,” presents a different set of challenges from summarization in other domains and genres of texts. While there exist other discussions on biomedical literature summarization methods, and EHR visualizations, in this review we focus on characterizing existing clinical summarization systems by outlining the system outputs and evaluations as well as highlighting the remaining challenges that exist in automated summarization. To categorize the summarizers highlighted in this review, we focus on two common dimensions used in the text summarization literature: extractive/abstractive summarization, and indicative/informative summarization. We define the four categories that describe summary types. Extractive summaries are created by borrowing phrases or sentences from the original input text. In the domain of clinical summarization, an extractive approach can identify pieces of the patient’s record and display them without providing additional layers of abstraction. Abstractive summaries generate new text that synthesizes the original text. In the domain of clinical summarization, abstractive summaries may provide additional higher-level context to explain the data, such as computed quantities (e.g., trends) or automatically generated text. Extractive and Abstractive summaries are further categorized as either indicative or informative. 3. Indicative summaries point to important pieces of the original text, highlighting significant parts for the reader. In the domain of clinical summarization, indicative summaries may convey, for instance, when key tests were performed or diagnoses were made. Indicative summaries are meant to be used in conjunction with the full patient record. 4. Informative summaries replace the original text. In the domain of clinical summarization, informative summaries are designed to be used independently of the full patient record, meaning they are used as a replacement for the original full set of raw data. How to evaluate a summarizer, both its accuracy and its added value in supporting users carry out information-related tasks has also been the subject of investigation in general domain and clinical summarization. Intrinsic evaluations focus on the internal validity of a summarization tool. Typically, experts evaluate the quality of the automatically produced summaries; or themselves create gold-standard summaries, against which automatic ones are compared. In an extrinsic evaluation framework, the usefulness of the summarization tool is assessed through its effectiveness in helping individuals carry out a task. For instance, a clinical summary could be evaluated in an extrinsic fashion by comparing how quickly and accurately trial coordinators can identify patients eligible for a trial with access to patients’ full records or with access to a summary instead. Almost since the inception of EHRs, there has been an interest in creating meaningful succinct summaries for clinicians. The research on automated summary creation has spanned over 30 years and initiated with extracting recent structured events in a patient’s history evolving into performing natural language processing (NLP) and automatically linking different data types, to create a more holistic view of the patient record. Table 1 lists clinical summarization systems proposed in the research literature in chronological order. We describe each system according to the following axes: the summarization approaches it implements, the type of input data it handles, the type of output summary, the way in which it was evaluated, and whether it was deployed in a clinical environment. Overall, summarization approaches investigated in clinical summarization have primarily been for indicative and extractive summarization. We also note a lack of evaluation, especially in the most recent years. We discuss in further detail the methods used for summarizing clinical data, along with the open research questions present in each of the summarization steps.

Table 1

A sampling of clinical summarization applications, organized by publication date

	Summarization approach	Input	Output	Evaluation	Deployed (when is it generated)	General Notes
NUCRSS²²^,²⁶	Extraction of clinical variables, indicative	Real structured EHR data	An eight page summary of: Problem list, Vital signs, Cardiac-pulmonary-renal diagnoses, Treatments, Routine specialized laboratory examination, Suggestions to physicians regarding patient care	Laboratory study with medical students and physicians showed significant time savings and increased accuracy Randomized controlled trial found showed that the NUCRSS improved process level (patient’s length of stay and increased the amount of laboratory tests ordered) outcomes and may have improved care.	Yes (each patient visit)	Early example of a summarizer One of the few summary evaluations that demonstrate an impact on quality of care and process outcomes.
STOR²⁷	Extraction of clinical variables, indicative	Real structured and unstructured EHR data	Loosely customizable, summary which included both time- and problem- oriented views	Clinical study found that clinicians were better able to predict their patient’s future symptoms and laboratory test results when the using medical record in addition to STOR as opposed to just the medical record.	Yes (each patient visit)	Early example of a summarizer One of few examples of task-based evaluation The summary is context-dependent on the patient, but the context is manually determined by the clinician (what problems are active, what observations are relevant, etc.)
Powsner and Tufte¹¹^,²⁸	Extraction of psychiatric variables and recent notes, indicative	Simulated structured, unstructured and genealogy data	A one-page summary that visualizes the most salient content (as defined by recency) of the patient record.	None	No	A widely referenced prototype that continues to serve as a model for current EHR visualization and summarization applications.
Lifelines²⁹^,³⁰	Extraction of clinical variables, indicative	Simulated structured data	Holistic interactive patient summaries using a temporal data view on top of the raw EHR data. Displays facts as lines on graphic time axis according to their temporal location and categories/significance are represented by color and thickness.	The original Lifelines application was evaluated for work with juvenile youth records²⁹ by a small group of users who reported enthusiasm but mentioned potential biasing by the system’s graphics.	No	Lifelines is probably the most well-known summarizer tool. The display has served as a model for future timeline-view clinical summarizers Lifelines2 was created for research and examining many patients together.
CliniViewer²³	Extraction of concepts from text, indicative	Real unstructured EHR data	Combined NLP techniques and presented a tree view of a patient’s problems extracted from the narrative text to the clinician. Displays concepts in context when clicked.	The system was able evaluated on accuracy and speed using real discharge summaries but no evaluation with clinicians was conducted.	No	One of the first examples of summaries created using NLP Allows for customizable user views Works on top of the MedLEE³¹ NLP engine which handles modifiers
IHC Patient Worksheet³²	Extraction of clinical variables, indicative	Real structured EHR data	1–2 page outpatient summary of: Demographics, Problems, Medications, Laboratory tests, Actionable advisories	A retrospective cohort study found that compliance with HbA1c testing was higher for patients who had a worksheet printed than for those who did not.	Yes (each patient visit)	One of the few example of a clinical outcome tested in the evaluation
CLEF^33–35	Abstraction from text and extraction of clinical variables, indicative	Simulated structured and unstructured cancer patient data.	An interactive display of both navigational capabilities for the EHR (indicative) and generates textual summaries (abstractive) to enhance comprehension. It uses information extraction techniques to identify classes of data and relationships between them.	None	No	One of the few natural language generation systems created for medical histories. Represents histories as a semantic network of events organized temporally and semantically. Lists requirements that are very relevant to general designers of clinical summaries – the list was generated via initial requirements elicitation process. Uses a logical model of cancer history

The inputs, outputs, methods, and evaluation strategies are listed along with notable additional information for each summarizer.

A sampling of clinical summarization applications, organized by publication date An eight page summary of: Problem list, Vital signs, Cardiac-pulmonary-renal diagnoses, Treatments, Routine specialized laboratory examination, Suggestions to physicians regarding patient care Laboratory study with medical students and physicians showed significant time savings and increased accuracy Randomized controlled trial found showed that the NUCRSS improved process level (patient’s length of stay and increased the amount of laboratory tests ordered) outcomes and may have improved care. Early example of a summarizer One of the few summary evaluations that demonstrate an impact on quality of care and process outcomes. Early example of a summarizer One of few examples of task-based evaluation The summary is context-dependent on the patient, but the context is manually determined by the clinician (what problems are active, what observations are relevant, etc.) Lifelines is probably the most well-known summarizer tool. The display has served as a model for future timeline-view clinical summarizers Lifelines2 was created for research and examining many patients together. One of the first examples of summaries created using NLP Allows for customizable user views Works on top of the MedLEE NLP engine which handles modifiers 1–2 page outpatient summary of: Demographics, Problems, Medications, Laboratory tests, Actionable advisories One of the few natural language generation systems created for medical histories. Represents histories as a semantic network of events organized temporally and semantically. Lists requirements that are very relevant to general designers of clinical summaries – the list was generated via initial requirements elicitation process. Uses a logical model of cancer history A crossover study compared KNAVE-II with paper charts and Excel spreadsheet. Users produced quicker answers, had somewhat better accuracy and preferred KNAVE-II however it did not achieve a very high system usability score. Performs semantic, temporal, and context abstraction. Requires domain-specific ontologies. Consists of a knowledge base, abstraction generator, navigation engine, and visualization. Lists 12 desiderata for interactive, time-oriented clinical data that should be used to guide future summarization work as well. A largely successful process outcome. Explores the utility of summaries in a low-resource setting. Timeline had manually coded rules while AdaptEHR aims to automatically infer rules and relationships from ontologies and graphical models, the publication states that the conditional probability tables are not yet defined. Has four dimensions of representing data: time, space (where physical location of tumor), existence (certainty), and causality (treatment response treatment) Aggregates information from multiple care settings Operates on top of a commercial EHR system using HL7 messages Distributed computing infrastructure to enable real-time summarization. The inputs, outputs, methods, and evaluation strategies are listed along with notable additional information for each summarizer.

METHODOLOGICAL CHALLENGES

The following sections present some unsolved challenges in clinical summarization. A conceptual framework proposed by Feblowitz et al. defines a set of actions that successful summarizers should accomplish with raw information: Aggregate, Organize, Reduce/Transform, Interpret, Synthesize. We discuss methodological challenges with automated summarization within the context of this framework. Specifically, – To successfully aggregate disparate clinical data sources, the ability to recognize and account for similarity is imperative. Such similarity occurs at different levels within narratives: from word-level similarity to concept to statement-level; as well as in other data types and across. We focus our discussion on textual similarity. – The organization and interpretation of the aggregated data requires extraction and reasoning over clinical events and their temporality. We examine extraction of temporal information from text along with representation and reasoning over clinical events. – The organization and interpretation of the aggregated data also requires that missing data points be accounted for. Patients are sometimes seen with predictable regularity but are most often seen at erratic intervals. Missing data points are often filled in by imputation, adding missing data indicators, deleting information with missing data, or other strategies. – In the reduction and transformation of data and its synthesis, it is critical to decide which pieces of information are important and must be contained in the summary. Some methods for automatically detecting importance have relied on linguistic structure while others use probabilistic modeling techniques. – To provide context for interpretation and synthesis of clinical data, it is useful to employ existing knowledge and create rules for the summarization. Knowledge-based heuristics often provide a way to specify time constraints, concept relationships, and abstractions. – Finally, to successfully implement summarizers into clinical care, challenges of deployment need to be addressed. Because in vendor EHR systems there are limited opportunities to deploy innovative and experimental technology, there have been few attempts to translate patient record summarization systems into the clinic; however, to demonstrate utility, it is imperative to implement and study clinical summarization tools in the real world care setting.

1) Identifying and aggregating similar information

We review approaches to identifying and aggregating similar information on three different levels of language abstraction: words, concepts, and statements, as investigated within and outside the field of clinical summarization.

Word-level Similarity

In clinical NLP, much work has been devoted to identifying lexical variants that are similar in meaning. The Unified Medical Language System (UMLS), for example, provides essential knowledge towards that goal by grouping words into concepts. For instance, the terms MI, myocardial infarction, and heart attack all share lexical similarity, and map to the same underlying concept. Within clinical summarization, normalization of words to concepts has only recently been investigated., An alternative, and most common approach in clinical summarization, is to identify word-level similarity by finding redundant strings of words. Patient records often contain redundant spans of text – this can be explained by the fact that documentation is often formulaic but also by the common habit of clinicians to copy and paste text from one note to another. Multiple different automated methods have been employed to identify copy and pasted words within clinical notes. A plagiarism detection tool called CopyFind has been used to identify overlapping phrases in input texts. More recently, global and local, bioinformatics-inspired alignments have been proposed for identifying redundant sections along with language modeling techniques for assigning probabilistic similarity scores for phrase pairs.

Concept-level Similarity

Concept-level similarity represents a more abstract level of similarity than similarity between words and strings. For instance, the concepts “epilepsy” and “seizure” – despite being two different UMLS concepts – share much semantic similarity when conveyed in a patient record. In certain well-defined domains, clinical summarization approaches have relied on aggregating concepts, helping further the goal of synthesis, primarily through well-defined ontologies. For broader domains, how to identify that two semantic concepts are similar enough to be aggregated remains an open question. Furthermore, in text processing, mapping from words to concepts remains difficult because of the strong ambiguity of language. Detection of semantic redundancy has been investigated through two approaches: knowledge-free and knowledge-based. Knowledge-free similarity metrics have been developed for textual input. They rely on Harris’ 1968 hypothesis which stipulates that concepts that appear in similar contexts are similar. In practice, concepts are compared in a vector space, where each concept is a vector representing the context in which the concept typically occurs. This method has been implemented multiple times in the clinical domain to identify similar UMLS concepts. Knowledge-free approaches are attractive when there is little ontological knowledge available. Alternatively, knowledge-based methods leverage existing resources to determine the similarity of two concepts. For instance, if the two concepts are present in an ontology, similarity can be assessed through the structure of the ontology. Other knowledge-based methods include examining similarity of the two concepts’ definitions. We refer the reader to detailed reviews of concept-based similarity., Despite the active research on this topic, these concept-level similarity methods have not been yet translated to most clinical summarization systems.

Statement-Level Similarity

A pervasive aspect of a patient record is the high level of statement redundancy across notes. For instance, two pathology reports for a given patient share many similar statements. Beyond the formulaic nature of documentation, statement-level redundancy also occurs because of copying and pasting from previous notes with some minimal editing of the copied statements. In clinical summarization, there has been little work on this important aspect of similarity identification. Recently, a topic modeling approach was proposed to identify and control for such redundancy across patient notes. In the general NLP community, identifying statement level similarity has been studied through the tasks of paraphrasing identification and textual entailment. Many of the methods in text summarization for identifying both unidirectional (textual entailment) and bidirectional (paraphrasing) similarity employ a hybrid of methods for word-level and concept-level redundancy such as string similarity, logic-based methods, and context-vector. Along with the need for higher-order language similarity work in the clinical domain, there is an ongoing push to personalize similarity detection. It is well established that semantic similarity is context-dependent and a recent study suggests that redundancy be examined as a function of the patient’s previous history. While identification of similar contexts based on the patient’s health is an ongoing direction of research, there is further work to be done in identifying context-specific similarity on higher-order semantic levels. Identifying similar words, concepts, and removing redundancy by patient-tailored information aggregation is an important direction for future EHR summarization methodology.

2) ORGANIZING AND REASONING OVER TEMPORAL EVENTS

Patients’ health evolves on many different time scales. Some health events such as pneumonia present themselves sporadically while chronic conditions like diabetes develop and worsen over a period of years. The importance of presenting clinical data in a time-dependent fashion has been recognized for a long time however accurate temporal representation remains an open problem. Automatic creation of a clinical data timeline from textual and structured clinical records requires temporal event extraction, ordering, and reasoning. Temporality is an active research area in the genre of news summarization given the quick news cycle and fast-paced evolution of news stories. However, news summarization research cannot always be readily translated into the health domain, as the challenges in health data are unique., For example, different note types and specialties have different temporal relationships: pathology reports are often about one moment in time without reference to historical ailments whereas discharge summaries describe an entire inpatient hospital stay and instructions for future care. Styler et al. identified four complexities with extracting temporal information in clinical data: (i) diversity of time expressions; (ii) complexity of determining temporal relations among events; (iii) the difficulty of handling the temporal granularity of an event; and (iv) general NLP issues. After the extraction of event time, there is a need for performing relative temporal ordering. Event ordering is difficult in part due to inexact wording, but also because clinical knowledge is often needed to infer how long conditions may last (e.g., a diabetes diagnosis is often not discussed at every visit but a clinician is aware that diabetes is a chronic condition, not an intermittently reoccurring condition each time the “diabetes” term is mentioned or the diabetes ICD-9 code is recorded). Some recent work in event ordering includes the representation of temporal disease progression separately for each problem by Sonnenberg et al., an approach they call “clinical threading” and frame-like semantic representations with rule-based temporal extraction to arrange problems on a timeline. Raghavan et al. identify and temporally order cross-narrative medical events across documents in clinical text using weighted finite state transducers. Reasoning and abstraction of extracted clinical events to highlight disease progressions and trends is critical for creating succinct clinical summaries. Abstractions of temporal data can include combining events within a certain time frame and performing interval-based abstractions such as combining multiple chemotherapy drug mentions into a chemotherapy regimen time span or reasoning about the length of time that symptoms lasted and their relation to diagnosis. The questions of which events should be combined and what an appropriate time frame is remain difficult and currently resolved by leveraging clinical knowledge and ontologies. Time-dependent clinical summarization is a continuingly evolving research area and there is opportunity for automatically identifying, accurately ordering, and performing reasoning over temporal clinical events.

3) ACCOUNTING FOR AND INTEPRETING MISSING DATA

Clinical records are sparse: documentation only occurs when a patient is seen by a clinician, thus clinical records miss the overwhelmingly large amount of observations about a patient across their lifetime. When summarizing sparse data, a critical complication is how to interpret and reason over the missing data. In some cases, missing data is not important and can safely be ignored by a summarization system (e.g., a patient has no change in health status in between visits). In other cases, the presence of missing data hints at a salient aspect about the patient that needs to be highlighted within the summary (e.g., patient is too sick to come to their visit). How to interpret and determine the salience of missing data is a challenge, and one not investigated thus far in clinical summarization. In the field of general statistics, there are three types of missing data: Missing Completely at Random, Missing at Random, and Missing Not at Random. Most techniques for dealing with missing data assume that data are Missing Completely at Random or Missing at Random distributed, and include (i) variations of complete-case analysis, where only data with no missing values are used, (ii) single imputation, where missing data are imputed based on the values observed (using the mean, median, linear interpolation, etc.), and (iii) likelihood-based methods which compute maximum likelihood estimates for missing data. In the clinical domain, there is mounting evidence that most of the data are Missing Not at Random., For these data, the missingness is informative, meaning that there is an underlying reason that the data are missing but that this reason is simply unobserved. Some techniques that use informative missing data properties to infer properties about clinical data have been proposed. A common way of using missing data in the clinical domain has been to look at how long values should last based on recorded measurements or documentation frequency. For example, laboratory test measurements have been studied to gather appropriate imputation time and to infer health status features. Van Vleck studied duration and persistence of problems in notes as a function of missing data, while Klann and Perotte both studied the duration of ICD-9 codes. Klann estimated the durations for which each ICD-9 code remains valid and Perotte automatically classified ICD-9 codes into chronic and acute conditions. The modeling work that most explicitly demonstrates informativeness in missing data examined the accuracy of prediction models when: (i) ignoring missing data, (ii) interpolating missing data or (iii) incorporating a missing data indicator, and reported that the missing data indicator method performed best. To properly provide context and infer trend lines, as demonstrated by Poh and de Lusignan for kidney disease data,, or to make predictions in clinical summaries it is critical to incorporate missing data literature and techniques into summarizer applications. The utility of modeling missing data explicitly is clear, however this conclusion is not being translated into clinical summarization research yet.

4) REDUCING INFORMATION TO ONLY THE MOST SALIENT

Salience identification has been heavily researched in the general domain text summarization literature. Early methods for identifying important topics relied on counts: frequency and term frequency-inverse document frequency, which corrects for word specificity. Other methods have focused on structure, such as document structure or syntax structure to identify important phrases. Syntactic information gleaned from the input document can identify which parts of a sentence are salient and which may be safely removed from a summary (e.g., a relative clause). It is unclear, however, how these approaches translate to the clinical domain, where syntactic structure is unconventional. Using prior knowledge of the input document structure (e.g., biomedical papers have an introduction, followed by a methods section) to weigh the salience of information pieces based on where they are conveyed in the document is, however, promising in the clinical domain (yet not investigated thus far). Clinical notes follow a pre-specified structure; a diagnosis mention might be more relevant when conveyed in the past medical history than in the family history for instance. A different method for salience identification, still within the general domain summarization field, leverages discourse by considering sentences in input documents through a network, where lexical similarity between sentences is represented by the network edges. In this representation, salient sentences are the ones with the highest centralities., An alternative method for identifying relevant information relies on probabilistic modeling techniques such as Hidden Markov Models for identifying topics and topic changes in a set of documents or hierarchical Latent Dirichlet Allocation-type models for identifying novel information with respect to older documents. These Bayesian learning techniques for constructing effective automated summaries have also yet to be explicitly translated into the clinical arena. The one type of salience detection that has been explicitly studied in the clinical domain is based on cue phrases. Cue phrases are pieces of text that signify that what follows is likely to be important. For example, “In conclusion” often precedes an important summarizing statement. In clinical documentation, de Estrada et al. developed a system called Puya that found cue phrases indicating normality or abnormality in the physical exam sections of notes. Another way of detecting salience relies on n-gram language modeling to identify the most recent information in the record, under the assumption that the newest information is the most salient for the provider to see., A visualization prototype used this n-gram model to automatically highlight text that was found to be novel, drawing the provider’s attention to the new findings. Defining salience in an operative fashion for automated summarization is an open question. In the general domain, there is evidence that humans sometimes disagree about what pieces of information are indeed salient, and that salience is often task-specific. Similarly, in the clinical domain, determining what is important for a clinician is also probably quite task-specific. Nevertheless, it is safe to say that salience of elements in the patient record is related to capturing the health status of the patient and how it changes through time., How to do so automatically, that is how to link textual and individual raw low-granularity observations to high-level clinical abstractions is one of the paramount challenge of informatics research. For instance, there has been little formal investigation of clinically specific markers of importance such as absolute change of a laboratory test value, the rate of change, the rate of mention of a particular concept, and other importance cues.

5) USING EXISTING CLINICAL KNOWLEDGE

The informatics community has invested enormous effort into codifying clinical knowledge in a variety of terminologies and ontologies. This knowledge representation effort has been successful in helping efforts like phenotyping combine terminological knowledge, expert reasoning, and machine learning to create actionable disease definitions. Similarly in summarization work, it is important to make use of these available clinical knowledge representations and use them to generate rules and heuristics. Several holistic summarization efforts leveraged terminologies to identify concepts that are semantically related (e.g., medications that treat particular conditions) or rules to determine salience (e.g., identify and highlight the salient results that are abnormal). However, summarization engines built for particular diseases benefit most often from manually crafted rules and disease-specific knowledge bases as they enable tailored, task-dependent systems. The KNAVE-II application, created for synthesis of bone marrow transplant patients, relies on an expert-maintained knowledge base for creating a semantic navigation system and concept abstraction. The Timeline system is also built on a manually coded set of rules which identify salient concepts for different diseases, and perform temporal event reasoning. In addition, summaries that are setting and user specific often use expert-driven rules to ascertain which pieces of data should be shown at which time and to whom. Although the incorporation of clinical expertise into summarization is often a laborious process and sometimes only covers specific domains of expertise, it provides critical help in addressing some of the similarity, temporality and salience challenges. Of relevance to this review, we note that while existing summarizers rely on established knowledge resources, there is an active field of research to create these resources either by translating clinical expertise or acquiring the resources from data.

6) DEPLOYING SUMMARIZATION TOOLS INTO THE CLINIC

The ultimate goal of any clinical summarization tool is implementation and usage by clinicians at the point of care. To date, however, there has been no widespread adoption of automated summarizers, especially for the large holistic temporal summarizers. Pervasive deployment is often hindered by the commercial EHRs systems that have been adopted across the country. Building real-time computational tools to work atop commercially built EHR systems is still a daunting task as these vendor EHR systems are often not built to support interaction with outside applications. In addition, as the systems are closed off, dissemination of summaries across different hospitals and EHRs is a challenge as well. However, there is promising work with the i2b2-SMART platform that enables easier translation across institutions; researchers have developed a system to automatically link different data types across the EHR (mainly diseases and medications) and display a newly organized view of the patient record. To create meaningful and practical summaries that assist clinicians during their point of care needs, summarizers need to provide real-time information with patient record updates immediately available in the summary. This is an especially difficult task when the summary tool works with natural language, as the processing must be completed quickly and accurately. Current work with distributed infrastructures, like Apache Hadoop, provides promising results for immediate summarization. Another large barrier to translation of summarizer research into the clinical domain is rigorous evaluation. Hospitals often call for evidence of a useful summarizer before investing expensive resources into the implementation of the summarizer, but without adoption a summarizer is extremely difficult to evaluate. As is clear from Table 1, clinical summarization literature lacks standard evaluation metrics and there are very few extrinsic evaluations, a similar finding to a review of biomedical literature summarization by Mishra et al. Given the restriction of limited adoption, it is not clear on which dimensions clinical summarizers should be evaluated. Initially, in order to avoid costly development and implementations with marginal benefit, it is imperative to study the need for a summarizer tool, context of usage, and clinician workflow. However, without eventual implementation into clinical care, showing any process- or health-level outcomes is not possible and therefore how to perform useful evaluations remains unclear: should, for instance, summarization systems focus on accurate information extraction, facilitating information exploration (e.g., which concepts are most relevant to the clinician), or user-friendly designs? Although the rigorous user-interface and cognitive process evaluations that are necessary for creating new summarization systems often require deployment and study of actual use in practice, there exists guidance in the literature on cognitive aspects of clinical reasoning that can inform summarization system creation. Prior work on general medical cognition, clinical decision-making,, human-computer interaction for interface design, handoff communication,, clinical workflow analysis,, and some recent qualitative work specifically on clinical document synthesis which has identified common cognitive pathways for EHR document synthesis and patterns of EHR data access can guide the development of summarization systems. However, we emphasize that without actually studying the clinical context and manner in which clinicians use summarizers (either in the laboratory with prototype systems or in the clinic with deployed systems), it will be challenging to develop better evaluation strategies and better summarizers.

CONCLUSION

Within the past decade, the number of health practices that have some electronic capability to store patient data has grown to almost 80%. Health information exchanges promise patient record integration across multiple care settings and the amount of available patient data continues to explode. The informatics community is posed to develop methods to mine the available information and ask questions such as: how can we further clinical knowledge, how can we assist clinicians in performing searches within and across patient records, how can we predict patient hospital course, and how can we automatically condense records to provide succinct summaries of a patient’s medical history? With this eruption of rich, complex, and essential health data for millions of patients, the informatics community has new opportunity to tackle challenges of interpreting a mounting wealth of health information.

FUNDING

This work was supported by National Science Foundation IGERT grant number 1144854 (R.P.), National Library of Medicine pre-doctoral fellowship grant number 5T15LM007079-19 (R.P.), National Library of Medicine award grant number R01 LM010027 (N.E.), and National Science Foundation grant number 1344668 (N.E.).

COMPETING INTERESTS

None.

CONTRIBUTORS

R.P. completed the literature review. R.P. and N.E. both identified existing gaps in the literature. R.P. and N.E. wrote the paper.

77 in total

Review 1. A primer on aspects of cognition for medical informatics.

Authors: V L Patel; J F Arocha; D R Kaufman
Journal: J Am Med Inform Assoc Date: 2001 Jul-Aug Impact factor: 4.497

2. Mining a clinical data warehouse to discover disease-finding associations using co-occurrence statistics.

Authors: Hui Cao; Marianthi Markatou; Genevieve B Melton; Michael F Chiang; George Hripcsak
Journal: AMIA Annu Symp Proc Date: 2005

3. Measures of semantic similarity and relatedness in the biomedical domain.

Authors: Ted Pedersen; Serguei V S Pakhomov; Siddharth Patwardhan; Christopher G Chute
Journal: J Biomed Inform Date: 2006-06-10 Impact factor: 6.317

Review 4. Temporal reasoning with medical data--a review with emphasis on medical natural language processing.

Authors: Li Zhou; George Hripcsak
Journal: J Biomed Inform Date: 2007-01-11 Impact factor: 6.317

5. Summarising complex ICU data in natural language.

Authors: Jim Hunter; Yvonne Freer; Albert Gatt; Robert Logie; Neil McIntosh; Marian van der Meulen; Francois Portet; Ehud Reiter; Somayajulu Sripada; Cindy Sykes
Journal: AMIA Annu Symp Proc Date: 2008-11-06

6. Quantifying clinical narrative redundancy in an electronic health record.

Authors: Jesse O Wrenn; Daniel M Stein; Suzanne Bakken; Peter D Stetson
Journal: J Am Med Inform Assoc Date: 2010 Jan-Feb Impact factor: 4.497

7. Graphical summary of patient status.

Authors: S M Powsner; E R Tufte
Journal: Lancet Date: 1994-08-06 Impact factor: 79.321

8. Analysis of complex decision-making processes in health care: cognitive approaches to health informatics.

Authors: A W Kushniruk
Journal: J Biomed Inform Date: 2001-10 Impact factor: 6.317

9. Prevalence of copied information by attendings and residents in critical care progress notes.

Authors: J Daryl Thornton; Jesse D Schold; Lokesh Venkateshaiah; Bradley Lander
Journal: Crit Care Med Date: 2013-02 Impact factor: 7.598

10. Patient-level temporal aggregation for text-based asthma status ascertainment.

Authors: Stephen T Wu; Young J Juhn; Sunghwan Sohn; Hongfang Liu
Journal: J Am Med Inform Assoc Date: 2014-05-15 Impact factor: 4.497

46 in total

1. An Interoperable Similarity-based Cohort Identification Method Using the OMOP Common Data Model version 5.0.

Authors: Shreya Chakrabarti; Anando Sen; Vojtech Huser; Gregory W Hruby; Alexander Rusanov; David J Albers; Chunhua Weng
Journal: J Healthc Inform Res Date: 2017-06-08

2. Can Patient Record Summarization Support Quality Metric Abstraction?

Authors: Rimma Pivovarov; Yael Judith Coppleson; Sharon Lipsky Gorman; David K Vawdrey; Noémie Elhadad
Journal: AMIA Annu Symp Proc Date: 2017-02-10

3. Patient Cohort Retrieval using Transformer Language Models.

Authors: Sarvesh Soni; Kirk Roberts
Journal: AMIA Annu Symp Proc Date: 2021-01-25

4. Timely and Efficient AI Insights on EHR: System Design.

Authors: Parthasarathy Suryanarayanan; Edward A Epstein; Abhishek Malvankar; Burn L Lewis; Lou DeGenaro; Jennifer J Liang; Ching-Huei Tsou; Divya Pathak
Journal: AMIA Annu Symp Proc Date: 2021-01-25

5. Automated extraction of sudden cardiac death risk factors in hypertrophic cardiomyopathy patients by natural language processing.

Authors: Sungrim Moon; Sijia Liu; Christopher G Scott; Sujith Samudrala; Mohamed M Abidian; Jeffrey B Geske; Peter A Noseworthy; Jane L Shellum; Rajeev Chaudhry; Steve R Ommen; Rick A Nishimura; Hongfang Liu; Adelaide M Arruda-Olson
Journal: Int J Med Inform Date: 2019-05-13 Impact factor: 4.046

6. Automatic Generation of Conditional Diagnostic Guidelines.

Authors: Tyler Baldwin; Yufan Guo; Tanveer Syeda-Mahmood
Journal: AMIA Annu Symp Proc Date: 2017-02-10

7. Characterization of Change and Significance for Clinical Findings in Radiology Reports Through Natural Language Processing.

Authors: Saeed Hassanpour; Graham Bay; Curtis P Langlotz
Journal: J Digit Imaging Date: 2017-06 Impact factor: 4.056

Review 8. Aspiring to Unintended Consequences of Natural Language Processing: A Review of Recent Developments in Clinical and Consumer-Generated Text Processing.

Authors: D Demner-Fushman; N Elhadad
Journal: Yearb Med Inform Date: 2016-11-10

9. Performing an Informatics Consult: Methods and Challenges.

Authors: Alejandro Schuler; Alison Callahan; Kenneth Jung; Nigam H Shah
Journal: J Am Coll Radiol Date: 2018-02-13 Impact factor: 5.532

10. Evaluating topic model interpretability from a primary care physician perspective.

Authors: Corey W Arnold; Andrea Oh; Shawn Chen; William Speier
Journal: Comput Methods Programs Biomed Date: 2015-10-30 Impact factor: 5.428