To our knowledge, this is the first systematic review to identify and evaluate methods used to validate a recording of asthma diagnosis in electronic health records.The review of validation of asthma diagnosis in electronic health records could inform selection of asthma identification algorithms used by future health outcome studies and identify any gaps in quality and scope of validation studies. It will also provide an overview of the algorithms with their positive predictive value, negative predictive value, sensitivity or specificity.Different databases may validate different algorithms to identify asthma, which might limit the generalisability of these algorithms as they are context-specific.This review is focused on the methodology of asthma recording validation, and not on all outcome results of studies (except the validation results). Because of this, publication bias might be an issue (methods that do not find positive results may be less likely to have been published).
Background
Asthma is a common chronic inflammatory disease of the airways. This condition is characterised by a variable expiratory airflow limitation which is generally reversible. The core symptoms are cough, wheeze, breathlessness and chest tightness.1 Asthma episodes can range from mild attacks, which interrupt daily life and work productivity, to severe and life-threatening attacks.2 Asthma is inherently variable and individuals will experience fluctuating symptoms. Most commonly, asthma emerges in childhood, but it can also arise in adulthood. Therefore, adult asthma consists of both persistent or relapsed childhood disease and true incident adult disease. There is no cure, but with the right treatment, symptoms can usually be managed and patients with asthma can lead their lives without disruption.1The widespread adoption of electronic health records (EHRs) means that large population-based primary and secondary care databases are available, proving a great opportunity for research on asthma and other diseases. The availability of routinely generated longitudinal records for research has dramatically increased over the last decades.3 However, the primary function of EHRs is to support healthcare clinical decision-making, not research purposes. The integrity of the research generated from EHRs may be questionable, unless data are thoroughly validated for this purpose.4–7EHRs are a digital reflection of the paper medical chart, while the main purpose of administrative claims data is administration of reimbursements to healthcare providers for their services. This systematic review will only consider data from EHR as the quality measures between the two types of data can be markedly different.8 9EHRs store information about diagnoses as clinical codes. A single code, or an algorithm consisting of multiple codes, can be used to retrieve records from EHR, and additional restrictions can be applied such as age or exclusion of other diseases.7 10 Alternatively, several authors have recently used natural language processing and machine learning techniques to automate algorithm generation for the identification of asthma diagnoses from large databases.11–13 The most common method to assess the validity of algorithms is to compare them with a gold standard such as another linkable data set or request a verification from the treating physician or the patient via a questionnaire.10 Another approach is active case detection where the databases are constantly screened to identify cases that emerge.14Several limitations apply to the validation of diagnosis recording in EHR. First, individual databases often only cover a single-care setting (primary or secondary care) as such case ascertainment only relies on a partial description of the healthcare pathway.15 Another issue is that the validity of different diseases will not necessarily be the same in a given data set. For example, mental health disorders such as anxiety or depression might be coded using less specific symptoms, whereas the validity of diagnoses with a very high specificity such as breast cancer is likely to be superior. There have been multiple studies which have measured the validity of specific databases for asthma.16 17 Sharifi et al have conducted a systematic review on validated methods to capture acute bronchospasm using administrative or claims data,18 which yielded two validation studies of bronchospasm codes.11 19This systematic literature review aims to provide an overview of methods used to validate asthma diagnoses, specifically in EHR. Such a study has not yet been published in the medical literature to the best of our knowledge.
Research question
The primary objectives of this systematic review are to provide an overview of both the methods with which asthma diagnosis recording has been validated in EHR and the estimates of the validation test measures.The questions of interest for this systematic review areWhich EHRs that are not only based on claims data have been used to obtain information on the diagnosis of asthma?Which algorithms have been used to define an asthma diagnosis (including diagnostic codes, possible spirometry tests and clinical descriptions)?How were the diagnostic criteria applied to the data sources and which other approaches have been used to validate a case definition?What are the estimates for the positive predictive value, negative predictive value, specificity and sensitivity for a diagnosis of asthma in EHRs that are not solely claims-based?
Methods
MEDLINE and EMBASE will be searched for the terms ‘asthma’, ‘validation’, ‘electronic databases’ and synonyms for each of these terms. In addition, reference lists of review articles and retrieved articles will be reviewed. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram of this protocol, from Moher et al,20 can be found in figure 1, and the search strategy can be found in the online supplementary file.
Figure 1
Study screening process: Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram from Moher et al.
Study screening process: Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram from Moher et al.
Inclusion criteria
Any type of observational study design that used EHR to validate the recording of an asthma diagnosis will be considered. Articles will only be considered if published in English and before October 2016 without any specific start date. Within the databases, we will consider asthma diagnoses based on both structured data (such as laboratory results and prescriptions) and free text data (such as spirometry results). We require the validated algorithms to be compared with an external gold standard, such as a manual review, questionnaires (completed by the patient or their physician) or an independent second database. We will include algorithms formed of single codes, those requiring multiple case characteristics and algorithms generated by natural language processing or machine-learning.
Exclusion criteria
Studies which involve pharmacovigilance databases (signal detection or spontaneous reporting), studies without validation process of asthma recording and conference abstracts will be excluded. Algorithms used in databases originating from only claims data will also be excluded as a systematic review on the validated methods to capture acute bronchospasm using claims data has been published recently.18Two independent authors will scan the abstracts and titles against the research questions and exclusion criteria and select articles for full-text review. After this full-text article review, eligibility for inclusion in the report will be decided by consensus or arbitration by a third reviewer. A uniform table with information of each included study will be populated after data extraction, which will include information on the author, date of publication, journal, database, algorithms, population, gold standard and test measure(s).
Data synthesis
Studies and study data will be managed using EndNote and Microsoft Excel, respectively. The methods for asthma recording validation will be summarised in a narrative synthesis and tables describing all identified verification processes, and their results. These results will consist of the recorded PPV, NPV, sensitivity and specificity of the included studies. Where possible, these tests will be calculated if they are not reported within the study.
Authors: Michael J Denney; Dustin M Long; Matthew G Armistead; Jamie L Anderson; Baqiyyah N Conway Journal: Int J Med Inform Date: 2016-07-29 Impact factor: 4.046
Authors: Sinéad M Langan; Eric I Benchimol; Astrid Guttmann; David Moher; Irene Petersen; Liam Smeeth; Henrik Toft Sørensen; Fiona Stanley; Erik Von Elm Journal: Clin Epidemiol Date: 2013-02-07 Impact factor: 4.790
Authors: Kevin Wing; Krishnan Bhaskaran; Liam Smeeth; Tjeerd P van Staa; Olaf H Klungel; Robert F Reynolds; Ian Douglas Journal: BMJ Open Date: 2016-09-02 Impact factor: 2.692
Authors: Francis Nissen; Jennifer K Quint; Samantha Wilkinson; Hana Mullerova; Liam Smeeth; Ian J Douglas Journal: Clin Epidemiol Date: 2017-12-01 Impact factor: 4.790
Authors: Lucy J Griffiths; Ronan A Lyons; Amrita Bandyopadhyay; Karen S Tingay; Suzanne Walton; Mario Cortina-Borja; Ashley Akbari; Helen Bedford; Carol Dezateux Journal: BMJ Open Respir Res Date: 2018-01-08