Literature DB >> 25954577

Electronic health records and disease registries to support integrated care in a health neighbourhood: an ontology-based methodology.

Siaw-Teng Liaw¹, Jane Taggart², Hairong Yu², Alireza Rahimi².

Abstract

Disease registries derived from Electronic Health Records (EHRs) are widely used for chronic disease management (CDM). However, unlike national registries which are specialised data collections, they are usually specific to an EHR or organization such as a medical home. We approached registries from the perspective of integrated care in a health neighbourhood, considering data quality issues such as semantic interoperability (consistency), accuracy, completeness and duplication. Our proposition is that a realist ontological approach is required to systematically and accurately identify patients in an EHR or data repository of EHRs, assess intrinsic data quality and fitness for use by members of the multidisciplinary integrated care team. We report on this approach as applied to routinely collected data in an electronic practice based research network in Australia.

Entities: Chemical Disease Species

Keywords: EHR; data quality; data repository; health neighbourhood; integrated care; patient registries; routinely collected data

Year: 2014 PMID： 25954577 PMCID： PMC4419761

Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc

Introduction

Disease registries derived from Electronic Health Records (EHR) are widely used for chronic disease management (CDM). However, not enough is known about the quality of EHR-based registers in the UK (1, 2) and Australia (3). There are publications about large administrative or population health databases, but little about disease registries created from multiple EHRs. Even less information is available about whether improved quality of EHR-based disease registries improve CDM, patient safety or quality outcomes. In addition to research, the increasing use of EHR-based registries, created through “blackbox” extraction tools, for clinical care can increase the likelihood and scope of data errors and adverse events (4). The design and development of EHR-based disease registries does not appear systematic or comprehensive (5). Aspects of quality of disease registries have been examined in the UK (2, 5) and through our own work on the consistency and quality of diabetes registries within an electronic Practice Based Research Network (ePBRN) in Australia (6). Our proposition (7) is that a realist (8) and ontological (9) approach is required to systematically and accurately identify patients in an EHR (10), or data repository of information from multiple EHRs, and assess intrinsic data quality and fitness for use by stakeholders such as members of the multidis ciplinary integrated care team or researchers (6). The realist approach (8) adopted for this evolving yet complex domain includes: Context: CDM, integrated care, evidence based practice; Mechanisms: systematic methods to assess and manage the quality of data integration, knowledge integration, clinical integration and interdisciplinary integration; Impacts/outcomes: improved data quality and fitness for use of disease registries, and, over the longer term, safety and quality of integrated care. The ontological approach to EHR-based registers includes the collection of formal, machine-processable and human-interpretable representations of the entities, and the relations among those entities, within a defined domain (11). Ontologies also provide regimentations of terminology that can support the reusability and integration of data, thereby supporting the development of automated systems for data annotation, information retrieval, and natural-language processing (11). By incorporating defined rules, ontologies can generate logical inferences and control the inclusion/exclusion of relevant objects (12), such as the patient with a diagnosis of diabetes mellitus (DM), abnormal pathology (Path) test, DM medication (Rx), or a DM cycle of care Medicare service payment item (10). In addition, a formal ontological model of the domain data and metadata can specify a unified context which allows intelligent software agents to act in spite of differences in concepts and terminology from different primary care EHRs. This will enable the systematic development of automated, valid and reliable methods to extract, link and manage data as well as assess the data quality and semantic interoperability issues. We have reported on our realist ontological approach (“Context-mechanisms-impact”) to the quality of routinely collected data and integrated care, the relevant concepts and their relationships (13). The context is focused on the need for complete, correct, consistent and timely information about the cycle of care, risk factors, disease indicators, quality of life and patient satisfaction. The mechanism is the development and validation of ontologies to conceptualise and formalize the information and methods required to implement evidence-based integrated care in a range of contexts. This will allow the development of software agents to find cases to create disease registries, assess the intrinsic data quality and determine fitness for integrated care. The quality of registries is influenced by the quality of EHR data, the case-finding system and associated quality processes, including currency and integrity, and the context such as clinical, insurance or other functions or objectives. Data quality (DQ) is defined by the International Standards Organisation as: “the totality of features and characteristics of an entity that bears on its ability to satisfy stated and implied needs” (ISO 8402–1986, Quality Vocabulary). This “fitness for purpose/use”(14) definition is necessarily multidimensional requiring all intrinsic components and extrinsic associations of the entity to meet benchmarks and work together to achieve the purpose or meet the requirements. An examination of the data quality literature (6, 15, 16) have led us to develop a more specific conceptual framework for data quality (DQ) and fitness for purpose (Figure 1).

Figure 1.

Data quality & fitness for purpose framework

The framework comprises intrinsic, extrinsic and contextual dimensions, each with their concepts and relationships. The intrinsic concepts cover the data elements and data-set, including the metadata, semantics (data meaning), provenance (who authored, where, when?) and constraints to the data meanings. The extrinsic concepts cover the information system, including concept representation, ontology, temporal relationships system architecture and user interface. The contextual determinants include the objectives of stakeholders such as the integrated care practitioner, resource constraints, security requirements, legislation, etc. Data elements are assessed intrinsically in terms of consistency, correctness; data sets in terms of completeness and duplicate records (6). We are developing ontology-based tools to assess the information required to support integrated care in terms of timeliness and relational, historical and temporal integrity between concepts. Temporal and conceptual relationships may be dependent or independent factors. Relationships may be at a number of levels e.g. at the concept or table levels. The contextual determinants have been assessed qualitatively, aiming to guide clinical and organizational strategies to improve data quality to ensure fitness for purpose. The unified context will allow intelligent software agents to act in an environment of different concepts and terminology from different EHRs. This paper will report and discuss this realist and ontological approach to developing automated, valid and reliable methods to define “cases” for a registry, manage data quality and determine fitness for purpose. We used the integrated care of diabetes mellitus in a health neighbourhood, as represented by the ePBRN, as a case study of the methodology of this work.

Materials and Methods

Setting

The ePBRN pilot group of 4 general practices has tested and validated the ePBRN data, processes and management in context, depending on the purpose. The internal validation of the ePBRN involved regular checking of the data and metadata using both automated and manual methods to examine the data repository. The data are also checked with probabilistic matching to assess the extent of duplicate patients and patients shared within the geographic region, the local health neighbourhood. The methodology was implemented with Microsoft SQL Server and an extension, Transact-SQL™ to link the server objects in the SQL Server with the heterogeneous datasets from multiple EHRs (17). The external validation of the ePBRN extraction tool involved a comparison against two other commercial data extraction tools (4).

Case-finding

The ePBRN ontological approach (10) used defined rules to generate logical inferences and control the inclusion/exclusion of the patient with a diagnosis of diabetes mellitus (DM), diabetes reason for visit (RFV), abnormal pathology (e.g. HbA1C, glucose tolerance test), diabetes medication (Rx) or glucose testing scripts, or a DM cycle of care item in the Medicare Benefit Schedule (MBS) (10). Following the query, the results were also analysed to exclude duplicate records/patients from the final result. This ontological approach was implemented and tested using SPSS and SQL, Each method acted as a control/validator for the other’s accuracy. The benchmark was established with a manual examination of the results of SPSS and SQL queries on the smallest participating practice (Practice 1) contributing to the ePBRN data repository.

Data quality management

The conceptualization of the DQ ontology (Figure 1) included operationalising the reported core dimensions such as accuracy, currency and completeness (15) or completeness, correctness, consistency and timeliness (6, 16) and including duplicates (to account for aggregating multiple EHRs), temporal pattern (to account for the constantly changing clinical “big data”) and timeliness which is important in integrated care. Validation of the conceptualization included discussions with practitioners and consumers of health care. The specification of the data quality ontology started with the definitions of completeness, consistency and correctness of data that we have reported previously (6).

Formalisation

To formalize the disease registry and DQ ontologies, we drew on the prevalent technical mechanisms and methodologies for ontology development, including knowledge acquisition, conceptualisation, semantic modelling, knowledge representation and validation (18, 19). Most used a layered approach (20) to incorporate clinical guidelines and rule-based approaches. The development tools used include: Protégé, a popular open source ontology editor and knowledgebase framework (http://protege.stanford.edu/); reference terminology (SNOMED-CT-Au); representation languages (Web Ontology Language (OWL), XML and RDF (Resource Description Framework)); query languages (SPARQL Protocol and RDF Query Language); rules languages (Semantic Web Rule Language (SWRL)); logic ontology reasoners to provide automated support for reasoning tasks in ontology and instance checking through -ontopPro-(http://ontop.inf.unibz.it/), an ontology based data access (OBDA) application (21). The patient data, associated with instances of ontology classes or properties, is populated through -ontopPro-. The knowledge component of the infrastructure, related to conceptual terminologies defined by the specified ontology, was built using SNOMED CT-AU and Web Ontology Language (OWL: http://www.w3.org/TR/owl-features/) through Protégé. Details have been reported elsewhere (17) on how the RDF schema is mapped to logics to support formal semantics and reasoning. Formal semantics describes precisely the meaning of knowledge i.e. the semantics does not refer to subjective intuitions, nor is it open to different interpretations by different actors or machines (22). We used the layered ontology methodology to address semantic interoperability issues amongst different EHR in the ePBRN (23–27). This approach enables intelligent software agents to act in various semantic contexts in collaborative environments. We implemented and tested the DQ ontology, using SPSS and SQL tools, with the pilot ePBRN (N=95,056) data repository.

Results

Ontological approach to find cases for a diabetes registry

An overall prevalence rate of 2.8%, lower than expected for diabetes, was found for this pilot dataset. Table 1 shows data completeness of relevant indicators (RFV, Rx, Path) used for this paper and highlights that the ontological approach was more sensitive, finding more cases than a single database table query. The range of 0.2–4.8% for single factor and 1.1–5.7% for the ontological approach across practices, suggest that data quality is a significant factor. The pathology and medication tables contributed most. Case finding was improved, but the main limitation had been data quality dimensions like data completeness and consistency (5). The denominator was also important in assessing prevalence as some practices do not accurately represent active and inactive patients in the EHRs.

Table 1.

Diabetes patients identified by diagnosis (RFV), HbA1C, medication, and ePBRN ontological approach

N = EHR flagged active patients	Practice 1(N=3863)	Practice 2(N=7028)	Practice 3(N=23,162)	Practice 4(N=30,717)	ePBRN(N=64,770)
Completeness of data:
• All RFV (All DM RFV)	95% (4.3%)	87% (5.7%)	92% (4.9%)	99% (6.5%)	95% (5.8%)
• All Rx (All DM Rx)	80% (2.4%)	94% (8.4%)	96% (5.4%)	96% (6.6%)	95% (6.4%)
• All Path (HbA1C)	16% (0.8%)	61% (8.0%)	63% (1.3%)	66% (1.5%)	62% 2.4%)
• All 3 (RFV+Rx+Path)	82%	90%	90%	92%	90%
Diabetes indentified by:	N (%)	N (%)	N (%)	N (%)	N (%)
• Reason for visit (RFV)	37 (0.9)	231 (3.3)	387 (1.4)	787 (2.6)	1,442 (2.2)
• Diabetes medication	19 (0.5)	332 (4.7)	446 (1.9)	803 (2.6)	1,600 (2.5)
• HbA1c	8 (0.2)	334 (4.8)	468 (2.0)	809 (2.6)	1,619 (2.5)
• ePBRN ontological approach	43 (1.1)	403 (5.7)	602 (2.5)	1,042 (3.4)	2,090 (3.2)

Duplication and other dimensions of data quality

Table 2 shows up to 13% patient records matched across the participating EHR neighbourhood, suggesting that data quality assessment and management should include the extent of duplication of data with information sharing across the neighbourhood as well as within practices where there can be up to 3% duplication (Table 3). This has significance for clinical use of EHR data in integrated and shared care as well as secondary uses for research, population health and policy guidance.

Table 2.

Record matching across general practices in a neighbourhood – shared patients

N=EHR active patients	Pract 1 (N=3863)	Pract 2 (N=7028)	Pract 3 (N=23,162)	Pract 4 (N=30,717)	ePBRN (N=64,770)
Practice (postcode)	Records (%)	Records (%)	Records (%)	Records (%)	Records (%)
Practice 1 (2176)		175 (2.5)	142 (0.6)	405 (13)	722 (1.1)
Practice 2 (2164)	173 (4.4)		327 (1.4)	691 (2.2)	1,191 (1.8)
Practice 3 (2171)	139 (3.4)	333 (4.7)		3,011 (9.8)	3,483 (5.4)
Practice 4 (2176)	400 (10)	692 (9.8)	3,005 (13)		4,097 (6.3)
Total	712 (18)	1200 (17)	3,474 (15)	4,107 (13)	9,493 (15)

Table 3.

Record matching within general practices – duplicated records

Suburb (postcode)	EHR Active patients	Matched patients (%)	Matched records (%)
Practice 1 (2176)	3,863	10 (0.2%)	20 (0.5%)
Practice 2 (2164)	7,028	97 (1.3%)	198 (2.8%)
Practice 3 (2171)	23,162	220 (0.9%)	447 (1.9%)
Practice 4 (2176)	30,717	413 (1.3%)	830 (2.7%)
Total	64,770	740 (1.1%)	1,495 (2.3%)

Specifying and formalising the ontological approach

In addition to SQL tools, we have used the various ontology development tools mentioned to formalize the ontology work. The formal specification of the ontologies developed is available as Protégé files. Testing has being conducted with one of the participating practice (Practice 1) in the ePBRN, using – ontopPro- to map to the relational ePBRN data repository and implement the built-in reasoners. SPARQL and SWRL were used as the underlying query languages. However, this is the subject of another paper in preparation, which will also compare the utility and validity of SQL-based inductive versus ontology-based approaches and tools to create accurate patient/disease registries and assess/manage the quality of routinely collected data in the ePBRN data repository and its source EHRs.

Discussion

Research into the quality of routinely collected data in EHRs and EHR-based disease registries, especially in primary care, is an evolving field. While standards and benchmarks are being developed in this research domain, a realist and ontological approach is the most appropriate to understand what is being done in what context and with what impact, given that the processes and knowledge base are continually evolving, requiring ongoing monitoring, evaluation and reflection. The ePBRN research confirms this need to ground the research and development work in context and in the real world of health practice, where data is noisy and continually changing. The ontological approach to case-finding identified a greater number of cases for inclusion in a disease/patient registry, highlighting the importance of this approach in the real world where data collection is suboptimal. Data quality management of aggregated information from multiple EHRs in a health neighbourhood to support integrated care must include the detection and management of duplicated records. Duplicates also lead to inaccurate public health and epidemiological research. Ontologies deal with reality (being) and the transformation (becoming) of concepts as they interact with one another over time. An ontologically rich approach to the creation of patient registries from EHRs is essential to optimise accuracy (10). The effect of data quality is predictable as the disease registry is only as good as the EHR from which it is created – and there is much room for improvement in EHR data quality (6, 16). The improvement requires realist ecological approaches to the governance and provenance of data quality across the data cycle from collection to management to display and secondary use in other applications such as electronic decision support (16, 28). This approach recognises that the quality of electronic data collected as part of routine clinical practice is determined by more than just the GIGO – garbage in garbage out-principle. For instance, data models are influenced by the database management system, security and access management software, organisational processes for data collection and management, and the people in the organisation who enter and use data (4). The ePBRN foundational work reported here, along with others, has confirmed this to a significant extent. As we validate the formal ontology tools developed in the ePBRN program and apply them to the development of fully automated methods to address the data quality of EHR and data repositories of ever increasing sizes, it is anticipated that this will build greater evidence for ontological approaches in the clinical and informational domains. The final tested ontologies and software tools can enable the systematic development of automated, valid and reliable methods to extract, link and manage data as well as assess/manage the data quality and semantic interoperability challenges.

Limitations

This is a work in progress, evolving from a pilot phase to an established representative practice-based research network (and, given resources, a health information exchange to support evidence-based clinical practice). Having said that, the ePBRN foundational work has been systematic and robust in the methodology adopted: to establish the ePBRN to reflect a local health neighbourhood with hospital, community health, general practice and other primary care services; to refine and test the tools to extract, link and manage the data repository of routinely collected data in multiple EHRs; and to make the transition from traditional management of “big data” from SQL and schematic relational databases to an ontological approach using semantic web principles and tools. The data reported is neither representative nor timely; it is part of a pilot ePBRN to conduct our experiments to validate our methodologies with real world data from primary and secondary care settings. Our data across all projects shows that the quality of routinely collected data in EHRs is not only variable and suboptimal (6), but also continually evolving and changing with time. This emphasizes the need for cost-effective and validated automated methods to assess and manage data and information systems in a timely manner. The ePBRN program demonstrates that the challenge is great but surmountable.

Conclusion

The specification of a unified context to enable intelligent software agents to act, in spite of differences in concepts and terminology from different EHRs, will enable the systematic development of automated, relevant, valid and reliable methods to extract, link and manage data as well as manage the data quality and semantic interoperability issues. This ontological approach to collecting, annotating, analysing and presenting clinical and scientific data is probably the only practical and sustainable solution to the information and data explosion. This is important to optimize the availability of good quality and relevant information to facilitate the safety and quality of integrated care as well as accurate and valid research.

17 in total

1. Data quality and fitness for purpose of routinely collected data--a general practice case study from an electronic practice-based research network (ePBRN).

Authors: Siaw-Teng Liaw; Jane Taggart; Sarah Dennis; Anthony Yeo
Journal: AMIA Annu Symp Proc Date: 2011-10-22

2. Health reform: is routinely collected electronic information fit for purpose?

Authors: Siaw-Teng Liaw; Huei-Yang Chen; Della Maneze; Jane Taggart; Sarah Dennis; Sanjyot Vagholkar; Jeremy Bunker
Journal: Emerg Med Australas Date: 2011-09-19 Impact factor: 2.151

3. ONTOFUSION: ontology-based integration of genomic and clinical databases.

Authors: D Pérez-Rey; V Maojo; M García-Remesal; R Alonso-Calvo; H Billhardt; F Martin-Sánchez; A Sousa
Journal: Comput Biol Med Date: 2005-09-06 Impact factor: 4.589

4. Semantic interoperability in telemedicine through ontology-driven services.

Authors: Probnab Ganguly; Pradeep Ray; N Parameswaran
Journal: Telemed J E Health Date: 2005-06 Impact factor: 3.536

Review 5. Realist review--a new method of systematic review designed for complex policy interventions.

Authors: Ray Pawson; Trisha Greenhalgh; Gill Harvey; Kieran Walshe
Journal: J Health Serv Res Policy Date: 2005-07

6. Disease prevalence in the English population: a comparison of primary care registers and prevalence models.

Authors: David Martin; James A Wright
Journal: Soc Sci Med Date: 2008-11-18 Impact factor: 4.634

7. Ontologies to improve chronic disease management research and quality improvement studies - a conceptual framework.

Authors: Harshana Liyanage; Siaw-Teng Liaw; Craig Kuziemsky; Simon de Lusignan
Journal: Stud Health Technol Inform Date: 2013

Review 8. Key concepts to assess the readiness of data for international research: data quality, lineage and provenance, extraction and processing errors, traceability, and curation. Contribution of the IMIA Primary Health Care Informatics Working Group.

Authors: S de Lusignan; S-T Liaw; P Krause; V Curcin; M Tristan Vicente; G Michalakidis; L Agreus; P Leysen; N Shaw; K Mendis
Journal: Yearb Med Inform Date: 2011

9. Defining datasets and creating data dictionaries for quality improvement and research in chronic disease using routinely collected data: an ontology-driven approach.

Authors: Simon de Lusignan; Siaw-Teng Liaw; Georgios Michalakidis; Simon Jones
Journal: Inform Prim Care Date: 2011

10. An ontological modeling approach to cerebrovascular disease studies: the NEUROWEB case.

Authors: Gianluca Colombo; Daniele Merico; Giorgio Boncoraglio; Flavio De Paoli; John Ellul; Giuseppe Frisoni; Zoltan Nagy; Aad van der Lugt; István Vassányi; Marco Antoniotti
Journal: J Biomed Inform Date: 2010-01-13 Impact factor: 6.317

1 in total

1. Usability of a Digital Registry to Promote Secondary Prevention for Peripheral Artery Disease Patients.

Authors: Alisha P Chaudhry; Ronald A Hankey; Vinod C Kaggal; Huzefa Bhopalwala; David A Liedl; Paul W Wennberg; Thom W Rooke; Christopher G Scott; Magali P Disdier Moulder; Abby K Hendricks; Ana I Casanegra; Robert D McBane; Jane L Shellum; Iftikhar J Kullo; Rick A Nishimura; Rajeev Chaudhry; Adelaide M Arruda-Olson
Journal: Mayo Clin Proc Innov Qual Outcomes Date: 2020-11-28

1 in total