Literature DB >> 25954589

Toward a cognitive task analysis for biomedical query mediation.

Gregory W Hruby1, James J Cimino2, Vimla Patel3, Chunhua Weng1.   

Abstract

In many institutions, data analysts use a Biomedical Query Mediation (BQM) process to facilitate data access for medical researchers. However, understanding of the BQM process is limited in the literature. To bridge this gap, we performed the initial steps of a cognitive task analysis using 31 BQM instances conducted between one analyst and 22 researchers in one academic department. We identified five top-level tasks, i.e., clarify research statement, explain clinical process, identify related data elements, locate EHR data element, and end BQM with either a database query or unmet, infeasible information needs, and 10 sub-tasks. We evaluated the BQM task model with seven data analysts from different clinical research institutions. Evaluators found all the tasks completely or semi-valid. This study contributes initial knowledge towards the development of a generalizable cognitive task representation for BQM.

Entities:  

Year:  2014        PMID: 25954589      PMCID: PMC4419754     

Source DB:  PubMed          Journal:  AMIA Jt Summits Transl Sci Proc


Introduction

Helping researchers access “Big Data” in the electronic health record (EHR) is essential for both public health initiatives and comparative effectiveness research (CER) in many academic medical centers,1,2 but remains a costly endeavor.3 In reality, CER involves complex, ultra-granular information needs that necessitate assistance from data analysts to extract representative data from the EHR. To do this, the medical researcher’s information need must be transferred to the data analyst, who may translate that information need into a precise and specific data query. The transfer of the information need from the medical researcher to the data analyst occurs through an iterative question-answering process. From this point on, we will refer to the transfer of the information need as the Biomedical Query Mediation (BQM) and a medical researcher may be any type of researcher seeking EHR data. During BQM, the data analyst may explain to the medical researcher relevant information of data restrictions and contextual data constraints, e.g., laboratory results may be more accurate than ICD-9 codes for identifying diabetes patients. Such information may guide medical researchers to reconsider and revise their queries. Analogous to reference interview or interactive information retrieval in the field of Library and Information Science, the success of BQM depends on the effectiveness of the iterative negotiations between the data analyst and the medical researcher.4–6 Reference interview elicits a clear and well-defined statement from the patron detailing the information need. However, literature provides neither rich insights into opaque BQM processes nor differences between locating data elements in the massive EHR information space and searching for books in libraries.7 Prior studies have shown that modeling interactive retrieval processes can lead to better designs of information retrieval systems.5,8,9 As these studies suggest that and obscure processes, such as BQM, are observed by the few performers, and the resulting understanding of the process is limited. We believe that modeling the knowledge of the series of tasks performed by data analysts during BQM is important for providing standard, user-centered support for data analysts and medical researchers. We previously reported a content analysis of the BQM between a data analyst and medical researchers.10,11 As a natural extension to that study, this paper presents a cognitive task analysis of BQM to illustrate the BQM tasks and knowledge required to perform each task.9,12 The purpose of a cognitive task analysis is twofold, first to outline the specific tasks used to accomplish a goal and second, to detail both the controlled and automated knowledge needed to perform each task identified.13 A cognitive task analysis contains five core steps, i.e., collect preliminary information, identify task knowledge representations, apply focused knowledge elicitation methods, analyze and verify data acquired, and format the cognitive task analysis results for the intended application.12 This study focuses on the first two steps. We modeled task activities and sequences, and knowledge needed to perform each task. Our task model underwent a face and content validation by external reviewers. Columbia University Medical Center Institutional Review Board approved this study. The rest of this paper first reports the methods and results and then discusses the implications of these findings for enhancing the BQM process.

Methods

Data Collection

Between July 2011 and January 2012, 31 discussions between one data analyst and 22 medical researchers were recorded and transcribed.

Data Analysis

Our analysis focused on the tasks occurring during BQM to accomplish transfer of the medical researcher’s information needs to the data analyst. We extended our previous work’s description of BQM constructs to seed BQM task identification.11 The seeding content used was (1) the research question, (2) the clinical process, and (3) EHR data elements locations. Through a random selection of BQM transcripts and e-mails we initially identified tasks related to the seeded content and extrapolated sub-tasks related for each of these tasks. Through an iterative task and sub-task identification and refinement process, a final task and sub-task list emerged. Next, we ordered the task list temporarily. As prescribed by cognitive task analysis, we elaborate on the individual task attributes by identifying the task goals and knowledge required to complete each task. Additionally, we constructed a BQM knowledge representation in the form of hierarchical task complexity.

Face and Content Validity Evaluation

We presented our BQM task model to seven external data analysts from medical centers that are known to engage in BQM between data analysts and medical researchers (Northwestern University and Columbia University). We asked the data analysts to fill out a 14-item questionnaire for the derived BQM task list. Two items asked the data analyst to provide their experience with and frequency performing BQM. We used two items to assess face validity on a scale from 1–10 for the dimensions of representativeness of and usefulness for BQM. If the median score was greater than or equal to 7 for representativeness and usefulness, we considered the BQM task list to have face validity. Content validity is a metric determining whether a representation is capable of performing its intended task. Ten items were used to measure content validity; each data analyst judged the 10 sub-tasks as essential, useful, or non-useful. Inter-rater agreement in the form of content validity ratio was applied to assess content validity. Task content validity were achieved if the content validity ratio reaches the minimum critical value of 0.620.14 Tasks were deemed semi-valid if at least half of the evaluators rated the task as essential.

Results

BQM task complexity

Figure 1 presents the hierarchical complexity representation of the BQM tasks. We identified five tasks, i.e., define research statement, illustrate clinical process, identify related data elements, locate EHR data elements, and end mediation, with ten corresponding sub-tasks. A typical BQM contains iterative topic switching; therefore, the BQM process is not a linear progression among these tasks. Descending the pyramid, each task is broken down to relatively simpler though still quite complicated tasks. This hierarchy also highlights the iterative process between both the clinical process with EHR data element location and other data elements with EHR data element location. As the medical researcher becomes aware of what they do not know about what they think they may know as it pertains to available data elements within the EHR, one of two things may happen: i.e., the BQM may stop or the medical researcher may revise the research statement to accommodate the new knowledge gained through the BQM.
Figure 1:

The complexity hierarchy and task flow for BQM

BQM task process and dimensions

Evaluator Characteristics

Of the seven evaluators, 29% (2/7), 43% (3/7), and 29% (2/7) have been facilitating data access for >10, 3–5, and 1–2 years, respectively; 29% (2/7), 43% (3/7), 14% (1/7) and 14% (1/7) facilitate >10, 5–10, 3–5, and 1–2 BQM per month.

Face and content evaluation results

Table 2 shows the score distribution and content validity ratio of the 10 items for the 10 sub-tasks from Table 1.
Table 2:

Task Content Validation Results

Sub-taskEssential (%)Useful (%)Non-Useful (%)Content Validity Ratio
1.1712900.43
1.2712900.43
2.1574300.14
2.2100001
2.3712900.43
2.4861400.71
3.17114140.43
4.17114140.43
4.2100001
5.1100001
Table 1:

BQM tasks and activities performed by the data analyst

TaskSub-taskGoalKnowledge RequiredExample
1. Define research statement1.1 Elicit the clinical research scenarioTo introduce core data elements of the information needStudy typesWhat is the research question?
1.2 Understand the design of the proposed researchTo establish the relationships among data elementsStudy typesAre you looking at pre-treatment factors that affect the outcome measure?
2. Illustrate clinical process2.1 Elicit the clinical progression related to the information needTo establish the temporal order of abstract data elementsMedical domain knowledgePatients with disease x that undergo treatment y, can you describe the diagnosis, treatment and follow-up timeline?
2.2 Gather specific details and data representations of the ordered abstract data elementsTo establish EHR data definitions for abstract data elementsMedical domain knowledgeDo all doctors refer to treatment X as x? What billing codes/image studies/lab tests are used for that type of visit?
2.3 Create list of unknown data elementsTo provide inputs for task 3HeuristicsWhat is the data element X? Please describe.
2.4 Understand how to calculate derived variables from EHR data elementsTo provide calculation parameters for derived variablesHeuristicsThe Duke University risk score takes into account variables x and y using this formula, x/y + 5.
3. Identify related data elements3.1 Elicit relevant abstract data elements not represented in the clinical processTo establish static variables required for the studyMedical domain knowledgeWhat demographic information do you need? Any specific comorbidities?
4. Locate EHR data elements4.1 Show or request to see the location of the EHR data elementTo establish location of data element within the data model of the EHREHR data model; EHR graphical user interfaceI’m unfamiliar with the data element X, where is it recorded in the EHR?
4.2 Describe availability and consistency of data elementsTo educate the medical researcher on data quality, accessibility and reliabilityEHR data model; Data quality, accessibility, and reliabilityData element X is not collected in the EHR; Data element Y is available sporadically from patient to patient.
5. End mediation5.1 Inform the medical researcher whether or not the information need can be satisfiedTo allow the medical researcher to reformulate their information need or end the BQMEHR data model; Data quality, accessibility, and reliabilityThat data element is contained in a scanned image and can’t be extracted from the EHR.
The first item addressing face validity, on a scale from 1 (not at all)-10 (completely), rating ‘to what extent does the task model simulate BQM’, the median score was 8 (7–10). The second item addressing face validity, on a scale from 1 (not useful)-10 (very useful), rating ‘how useful is this representation for novice data analysts conducting BQM’ the median score was 8 (6–10).

Discussion

We identified five tasks and 10 sub-tasks used to elicit the medical researcher’s information needs to the data analyst. Additionally, the BQM tasks were categorized into a complexity hierarchy. This representation serves as an initial theoretical framework for BQM. A closer look at similar frameworks used for interactive information retrieval and mediated searching shows some similarities. Specifically, the ASK hypothesis and Berrypicking model share similar task progressions to arrive at a clearer definition of an information seeker’s need.15,16 Additionally, Spink’s proposed theoretical framework details seven levels for mediated searching: Problem solving process, information seeking episodes, uncertainty, cognitive styles, interactive search sessions, successive search behavior and sets of situated actions.17 The proposed BQM does not necessarily overlap with these levels, but provides further depth for the interactive search session, or the dialogue between and a user and a system, in this case the channel to the system, the data analyst. We postulate the level of granularity present in EHR databases contributes significantly to the complexity of BQM. Unlike document databases, EHR database complexity is present in both the breadth of data elements and the features used to describe those data elements. Our theory resonates with other interactive information retrieval experts. Ford et al states, “The deeper and more structured are the knowledge representation formalism adopted, the more difficult it is to develop systems able to accommodate wide-ranging subject content.18” System development in this context may be used as a surrogate for the model of the proposed work flow the system is attempting to improve. As such, it can be inferred that increased information breadth and knowledge representation granularity will increase the complexity of the interaction with an information retrieval system.

Complexity hierarchy

Our top-level concept explains the ultimate goal of BQM, querying the EHR database. However, the concepts below it break this high-level concept into simpler atomic concepts. BQM, a critical, but yet complex component of EHR database queries is broken further down into the critical components, first the intent of the information need, what is the research the medical researcher wants to conduct. This is then followed by more granular components of assigning clinical data elements to a clinical time line and explicitly defining abstract elements into representations within the EHR. Additionally, other data characteristics that may not be directly tied to a clinical time are explored and the EHR representation is also defined. This model suggests two feedback loops and several potential BQM stopping points. As the data analyst becomes aware of the medical researcher’s information needs the data analyst has a clearer understanding of EHR information space’s ability to contain a representative dataset. Similarly, as the medical researcher’s awareness of the unknown and what is thought to be known may affect the initial intent thereby augmenting the research statement to accommodate what EHR data are suitable to represent an information need. Of note, knowledge required to perform all tasks come from different sources. These sources aid the data analyst to locate and map data elements to the internal data repository for the medical researcher. Both medical researchers and data analysts can contribute medical knowledge; that is why the conversation between the two is crucial.

BQM task process model

Of particular interest, task 1, define research statement, mirrors a similar process to the reference interview to understand the context for which the informant seeker is working from. Understanding the intent of the researcher provides a scaffolding of core information elements and the relationships of the information elements.4,19 Likewise, the initial task of BQM enables the data analyst to develop an internal information model similar to the one being used by the medical researcher MR. This mental model is a semantic relationship of the key medical data elements for which finer details related to those elements can be explored. Tasks 2, 3, and 4 are used to facilitate the focus of the information need. It has been shown that the focusing of the information seeker’s need provides an increases in precision of the results.7 While this study did not assess the precision of resulting datasets, tasks 2–4 support the notion of an iterative refinement and understanding of the medical researcher’s information need. Each task builds from the initial task, and the subsequently these task may augment the initial task or end the BQM. The final task represents the data analyst understanding of the medical researcher’s information need and whether or not that need can be met by EHR data. This task can either end the mediation with no results or move the mediation into a formal EHR database query.

Face and content validation

Our expert evaluators deemed the preliminary process model to have face validity. The ratings suggest our initial representation of BQM is both representative of and useful for BQM. Additionally, 40% (4/10) of the sub-tasks, sub-tasks 2.2, 2.4, 4.2, and 5.1, were judged to have content validity. All tasks were judged either semi-valid or valid. Given these positive results, we believe this initial knowledge representation of BQM is acceptable to continue with our cognitive task analysis by applying focused knowledge elicitation methods with data analysts representing diverse approaches to BQM.

Limitations

Our study contains several limitations. This representation is based on a specific data set covering the BQM process for just one data analyst and one medical research domain, urologic oncology. Our findings may not be representative of information needs from other medical domains, nor may it be inclusive of other expert data analyst tasks used to transfer a medical researcher’s information need to them. Additionally, the EHR data being accessed was represented by an integrated research data repository, which contained highly granular data. The type of EHR information source may affect the BQM process. Regardless of these limitations, the initial components of the cognitive task analysis will allow us to move forward with informed semi-structure interviews of other data analyst experts with the sole purpose of extracting additional knowledge of BQM.

Conclusions

This study contributes preliminary knowledge of BQM task sequence and a task complexity knowledge representation. This knowledge can guide future work on cognitive task analysis for acquisition of additional information from a diverse group of data analysts on the tasks used to accomplish a generalizable BQM.
  4 in total

1.  Comparative effectiveness research and medical informatics.

Authors:  Leonard W D'Avolio; Wildon R Farwell; Louis D Fiore
Journal:  Am J Med       Date:  2010-12       Impact factor: 4.965

2.  A cognitive task analysis of information management strategies in a computerized provider order entry environment.

Authors:  Charlene R Weir; Jonathan J R Nebeker; Bret L Hicken; Rebecca Campo; Frank Drews; Beth Lebar
Journal:  J Am Med Inform Assoc       Date:  2006-10-26       Impact factor: 4.497

3.  Big bad data: law, public health, and biomedical databases.

Authors:  Sharona Hoffman; Andy Podgurski
Journal:  J Law Med Ethics       Date:  2013-03       Impact factor: 1.718

4.  Characterization of the biomedical query mediation process.

Authors:  Gregory W Hruby; Mary Regina Boland; James J Cimino; Junfeng Gao; Adam B Wilcox; Julia Hirschberg; Chunhua Weng
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2013-03-18
  4 in total
  5 in total

1.  A multi-site cognitive task analysis for biomedical query mediation.

Authors:  Gregory W Hruby; Luke V Rasmussen; David Hanauer; Vimla L Patel; James J Cimino; Chunhua Weng
Journal:  Int J Med Inform       Date:  2016-06-16       Impact factor: 4.046

2.  Clinical Research Informatics for Big Data and Precision Medicine.

Authors:  C Weng; M G Kahn
Journal:  Yearb Med Inform       Date:  2016-11-10

3.  Leveraging dialog systems research to assist biomedical researchers' interrogation of Big Clinical Data.

Authors:  Julia Hoxha; Chunhua Weng
Journal:  J Biomed Inform       Date:  2016-04-08       Impact factor: 6.317

4.  A data-driven concept schema for defining clinical research data needs.

Authors:  Gregory W Hruby; Julia Hoxha; Praveen Chandar Ravichandran; Eneida A Mendonça; David A Hanauer; Chunhua Weng
Journal:  Int J Med Inform       Date:  2016-04-02       Impact factor: 4.046

5.  Dialogue Analysis for Clinical Data Query Mediation.

Authors:  Chunhua Weng; Amy K Mir; David Hanauer; James Cimino
Journal:  Stud Health Technol Inform       Date:  2019-08-21
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.