Literature DB >> 30703084

Identifying Parkinson's disease and parkinsonism cases using routinely collected healthcare data: A systematic review.

Zoe Harding1, Tim Wilkinson2,3, Anna Stevenson4,5, Sophie Horrocks1, Amanda Ly3, Christian Schnier3, David P Breen2,3,6, Kristiina Rannikmäe2,3, Cathie L M Sudlow2,3.   

Abstract

BACKGROUND: Population-based, prospective studies can provide important insights into Parkinson's disease (PD) and other parkinsonian disorders. Participant follow-up in such studies is often achieved through linkage to routinely collected healthcare datasets. We systematically reviewed the published literature on the accuracy of these datasets for this purpose.
METHODS: We searched four electronic databases for published studies that compared PD and parkinsonism cases identified using routinely collected data to a reference standard. We extracted study characteristics and two accuracy measures: positive predictive value (PPV) and/or sensitivity.
RESULTS: We identified 18 articles, resulting in 27 measures of PPV and 14 of sensitivity. For PD, PPV ranged from 56-90% in hospital datasets, 53-87% in prescription datasets, 81-90% in primary care datasets and was 67% in mortality datasets. Combining diagnostic and medication codes increased PPV. For parkinsonism, PPV ranged from 36-88% in hospital datasets, 40-74% in prescription datasets, and was 94% in mortality datasets. Sensitivity ranged from 15-73% in single datasets for PD and 43-63% in single datasets for parkinsonism.
CONCLUSIONS: In many settings, routinely collected datasets generate good PPVs and reasonable sensitivities for identifying PD and parkinsonism cases. However, given the wide range of identified accuracy estimates, we recommend cohorts conduct their own context-specific validation studies if existing evidence is lacking. Further research is warranted to investigate primary care and medication datasets, and to develop algorithms that balance a high PPV with acceptable sensitivity.

Entities:  

Mesh:

Year:  2019        PMID: 30703084      PMCID: PMC6354966          DOI: 10.1371/journal.pone.0198736

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Despite well-established pathological features, the aetiologies of Parkinson’s Disease (PD) and other parkinsonian conditions remain poorly understood and disease-modifying treatments have proved elusive[1]. Large, prospective, population-based cohort studies with biosample collections (e.g., UK Biobank, German National Cohort, US Precision Medicine Initiative) provide a robust methodological framework with statistical power to investigate the complex interplay between genetic, environmental and lifestyle factors in the aetiology and natural history of neurological disorders such as PD and other parkinsonian disorders[2-4]. Linkage to routinely collected healthcare data–which are administrative datasets collected primarily for healthcare purposes rather than to address specific research questions[5]–provides an efficient means of long term follow-up in order to identify large numbers of incident cases in such studies[2]. Furthermore, participant linkage to such datasets can be used in randomised controlled trials as a cost-effective and comprehensive method of follow-up for disease outcomes[6]. These data are coded using systems such as the International Classification of Diseases (ICD)[7], the Systematized Nomenclature of Medicine–Clinical Terms (SNOMED-CT) system[8], and the UK primary care Read system[9]. There are several mechanisms by which inaccuracies can arise when using routinely collected healthcare data to identify PD outcomes. False positives (participants who receive a disease code but do not have the disorder) may arise if a clinician incorrectly diagnoses the condition. Given that PD and other parkinsonian disorders are largely clinical diagnoses made without a definitive diagnostic test, there is the potential for diagnostic inaccuracies. Clinicopathological studies have shown discrepancies between clinical diagnoses in life and neuropathological confirmation[10] and there is evidence that accuracy increases when diagnoses are made by movement disorder specialists[11-13]. Secondly, diagnoses may be incorrectly recorded in medical records, or errors may arise during the coding process. Similarly, false negatives (patients who have the condition but do not receive a code) may arise due to under-diagnosis, omission of the diagnosis from the medical records (e.g., because the condition is not the primary reason for hospital admission), or errors during the coding process. As a result, before such datasets can be used to identify PD and parkinsonism cases in prospective studies, their accuracy must be determined. Important measures are the positive predictive value (PPV, the proportion of those coded positive that are true disease cases) and sensitivity (the proportion of true disease cases that are coded positive). Specificity and negative predictive value are less relevant metrics in this setting. A high specificity (the proportion of those without the disease that do not receive a disease code) is important to ensure a high PPV, thereby minimising bias in effect estimates. With an appropriately precise choice of codes, the specificity of routinely collected healthcare data to identify disease cases in population-based studies is usually very high (98–100%)[14,15]. However, in a population-based cohort study where the overall prevalence of a disease is low, a high specificity does not guarantee a high PPV—a large absolute number of people without the disease can be incorrectly classified as being disease cases (false positives), yet the overall proportion of misclassified cases can be low (high specificity, low PPV)[16]. NPV, like PPV, is related to disease prevalence and will therefore be high in population-based studies where most individuals do not develop the disease of interest[14]. Previous systematic reviews on the accuracy of routine data to identify other neurological diseases such as stroke[14], dementia[17] and motor neurone disease[18] have summarised the existing literature and identified methods by which accuracy can be improved, as well as areas for further evaluation. Here, we systematically reviewed published studies that evaluated the accuracy of routinely collected healthcare data for identifying PD and parkinsonism cases.

Methods

Study reporting

We followed the Preferred Reporting Items for Systematic Review and Meta-analysis statement (PRISMA) guidelines for the reporting of this systematic review[19].

Study protocol

We used the PRISMA Protocols (PRISMA-P) guideline to aid in the design of this study[20], and prospectively published the protocol (number: CRD42016033715, www.crd.york.ac.uk/PROSPERO/display_record.php?ID = CRD42016033715) [21].

Search strategy

We (AS & TW) searched the electronic databases MEDLINE (Ovid), EMBASE (Ovid), CENTRAL (Cochrane Library) and Web of Science (Thomson Reuters) for relevant articles published in any language between 01.01.1990 and 23.06.2017. Our search strategy is outlined in S1 File. We chose the date limits based on our judgement that accuracy estimates from studies published prior to 1990 would have limited current applicability. We did not exclude studies based on the dates covered by the datasets. We also screened bibliographies of included studies and relevant review papers to identify additional publications.

Eligibility criteria

To be included, studies had to have: compared codes for PD or parkinsonism from routinely collected healthcare data to a clinical expert-derived reference standard, and provide either a PPV and/or a sensitivity estimate (or sufficient raw data to calculate these). We excluded studies with <10 coded cases, due to the limited precision of studies below this size[17,18]. Studies reporting sensitivity values had to be population-based (i.e. community-based as opposed to hospital-based) with comprehensive attempts to detect all disease cases. Where multiple studies investigated overlapping populations, we included the study with the larger population size. Where articles assessed more than one dataset or evaluated both PPV and sensitivity, we included these as separate studies. Hereafter, we will refer to published papers as ‘articles’ and these separate analyses as ‘studies’.

Study selection

Two authors (AS and SH) independently screened all titles and abstracts generated by the search, and reviewed full text articles of all potentially eligible studies to determine if the inclusion criteria were met. In the case of disagreement or uncertainty, we reached a consensus through discussion and, where necessary, involvement of a senior third author (CLMS).

Data extraction

Using a standardized form, two authors (TW and ZH) independently extracted the following data from each study: first author; year of publication; time period during which coded data were collected; country of study; study population; average age of disease cases (or, if this was unavailable, the ages of participants at recruitment); study size (defined as the total number of code positive cases for PPV [true positives plus false positives] and the total number of true positives for sensitivity [true positives and false negatives]); type of routine data used (e.g., hospital admissions, mortality or primary care); coding system and version used; specific codes used to identify cases; diagnostic coding position (e.g. primary or secondary position); parkinsonian subtypes investigated; and the method used to make the reference standard diagnosis. We recorded the reported PPV and/or sensitivity estimates, as well as any corresponding raw data. After discussion, any remaining queries were resolved with a senior third author (CLMS). When necessary, we contacted study authors to request additional information.

Quality assessment

We adapted the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2)[22] tool to evaluate the risk of bias in the estimates of accuracy and any concerns about the applicability of each article to our specific research question (S2 File). Two authors (TW and ZH) independently assigned quality ratings, with any discrepancies resolved through discussion. We performed this evaluation in the context of our specific review question and not as an indication of the overall quality of the articles. We assessed risk of bias at the article level rather than study level, as the methods for each study within an article were very similar. We did not exclude studies based on their quality assessment ratings, but rather considered a given study’s results in the context of the article’s risk of bias and applicability concerns. Where articles deemed to be at low of bias and articles at high risk of bias reported PPV or sensitivity estimates on the same type of dataset, we compared the reported estimates to assess the potential effect of bias on accuracy estimates.

Statistical analysis/data synthesis

We tabulated the extracted data, and calculated 95% confidence intervals for the accuracy measures from the raw data using the Clopper-Pearson (exact) method. Due to substantial heterogeneity in study settings and methodologies, we did not perform a meta-analysis, as we considered any summary estimate to be potentially misleading. Instead, we assessed the full range of results in the context of study methodologies, populations and specific data sources. We also reported any within-study comparisons in which a single variable was changed to examine its effect on PPV or sensitivity. We performed analyses using the statistical software StatsDirect3.

Results

Study characteristics

From an initial 1319 identified articles, we removed 222 duplicates and excluded 994 considered to be irrelevant after screening the titles and abstracts. We therefore examined the full text articles for 103 papers. Of these, we excluded 37 that did not assess the accuracy of a routinely collected, coded dataset, 21 that did not validate the coded data against any reference standard, 12 that were not primary research studies, 11 that combined routine and non-routine data, three where no accuracy measure was reported or calculable, and four that did not assess coding in PD. 18 published articles fulfilled our inclusion criteria[23-40]. A flow diagram of the study selection process is shown in Fig 1. We obtained key additional information from the authors of two studies[32,36]. Of the 18 included articles, 13 reported PPV[23,25-36], four reported sensitivity[37-40] and one reported both[24]. Four articles contained more than one study[23-25,29]. One of these consisted of multiple sub-studies, using different methods to evaluate datasets across several countries, so we included these as six separate studies[25]. In total, there were 27 measures of PPV and 14 of sensitivity. Study characteristics are summarised in Tables 1 and 2 respectively.
Fig 1

PRISMA flow diagram.

Table 1

Characteristics of studies reporting positive predictive value, stratified by dataset type.

First author & year of publicationYear of studyCountryStudy population compositionAge*Proportion maleStudy size (n)Routine dataset usedCoding systemCodes used to identify casesDiagnostic coding positionReference standard
Hospital-derived datasets:
Butt2014[23]1991–2011CanadaPopulation of Ontario ≥20yrsMean 7460Inpatient: 79Outpatient: 435Hospital: inpatientHospital: outpatientICD-9 (pre-2002)ICD-10 (post-2002)Parkinsonism: ICD-9: 332.0, 332;ICD-10: G20, G21.0–0.4, G21.8–9, G22, F02.3Not specifiedMedical record review
Feldman2012[24]1964–2004SwedenTwins across Sweden >50yrsMean 75UnclearPD: 72 Parkinsonism: 75Hospital: inpatientICD-7 (1961–67)ICD-8 (1968–86)ICD-9 (1987–96)ICD-10 (1997–2009)PD: ICD-7: 350; ICD-8: 342.00; ICD-9: 332.0; ICD-10: G20Parkinsonism: ICD-8: 342.08, 342.09; ICD-9: 333.0; ICD-10: G21.4, G21.8, G21.9, G23.1, G23.2, G23.9, G25.9AnyScreening interview, medical record review and examination by physician
Gallo [a]2015[25]1994–2010SwedenHospital patients, EPIC study participantsMean 46 at recruitmentUnclear62Hospital: unclearICD-9 (pre-1996)ICD-10 (post-1996)PD: ICD-9: 332; ICD-10: G20, G21Not specifiedMedical record review
Gallo [b]2015[25]1991–2010SwedenHospital patients, EPIC study participantsMean 58 at recruitmentUnclear299Hospital: inpatient and outpatientICD-9ICD-10PD: ICD-9: 332; ICD-10: G20Not specifiedMedical record review
Kestenbaum2015[26]2009–2014USATertiary referral centre patientsUnclearUnclear100Hospital: unclearICD-9PD: 332.0Not specifiedMedical record review
Swarztrauber2005[27]1998–2002USAVeterans hospital patientsMean 7698175Hospital: inpatient and outpatientICD-9-CMParkinsonism: 332.0, 332.1, 333.0Not specifiedMedical records review
Szumski2009[28]2001–2004USAVeterans hospital patientsMean 7698577Hospital: outpatientICD-9-CMPD: 332.0Not specifiedMedical record review
Wei2016[29]UnclearUSAHospital patientsUnclearUnclear100Hospital: inpatient and outpatientICD-9PD: 332.0Not specifiedMedical record review
Wermuth2015[30]1996–2009DenmarkNeurological hospital patientsMedian 55–64592625Hospital: inpatient and outpatientICD-8ICD-10PD: ICD-8: 342, ICD-10: G20PrimaryMedical record review
White2007[31]1998–2000USAVeterans hospital patientsMedian 7596782Hospital: inpatient and outpatientICD-9-CMParkinsonism: 332.0, 332.1AnyMedical record review
Primary care-derived datasets:
Hernán2004[32]1995–2001UKGP patientsUnclearUnclear106Primary careRead codeNot specified (investigated PD)Not applicableMedical record review
Prescription-derived datasets:
Butt2014[23]1991–2011CanadaPopulation of Ontario ≥65 yearsMean 7460395PrescriptionsNot specifiedParkinsonism: Levodopa; MAO-B inhibitors; dopamine agonists; COMT inhibitorsNot applicableMedical record review
Meara1999[33]Not statedUKGP patientsMean 76UnclearPD: 402Parkinsonism: 402Prescriptions (from primary care)Not specifiedPD: Not specifiedParkinsonism: Not specifiedNot applicableHistory and examination by physician and medical record review
Wei2016[29]UnclearUSAHospital patientsUnclearUnclear100PrescriptionsNot specifiedPD: Rotigotine; Entacapone; Selegiline hydrochloride; Pergolide; RasagilineNot applicableMedical record review
Mortality datasets:
Feldman. A2012[24]1998–2007SwedenTwins across Sweden >50yrsMean 75UnclearPD: 18 Parkinsonism: 18MortalityICD-10PD: G20Parkinsonism: G21.4, G21.8, G21.9, G23.1, G23.2, G23.9, G25.9AnyScreening interview, medical record review and examination by physician
Combined datasets (accuracy measures for constituent datasets unable to be separated):
Bower1999[34]1976–1990USAPopulation of Olmsted countyUnclearUnclear2472Synthesised medical informationH-ICDAParkinsonism: H-ICDA 53 diagnostic codesNot specifiedMedical record review
Gallo [c]2015[25]1998–2010SpainEPIC study participantsMean 50 at recruitment4939Prescriptions;Primary care;Mortality; Hospital: inpatientATC/DDD indexICD-9PD: ICD-9: 332, 332.0, 332.1; ATC/DDD index N04, N04A, N04BNot specifiedMedical record review
Gallo [d]2015[25]Unclear—2010SpainEPIC study participantsMean 50 at recruitment4841Primary care;Prescriptions;MortalityICPCATC/DDD indexICD-9PD: ICPC N87; ATC/DDD index N04, N04A, N04B; ICD-9: 332.xNot specifiedMedical record review
Gallo [e]2015[25]1998–2010SpainEPIC study participantsMean 49 at recruitment3299Hospital: inpatient;Primary care;Prescriptions;MortalityICD-9ICPC2ATC/DDD indexICD-10PD: ICD-9: 332; ICPC2-WICC N87; ATC/DDD index N04x; ICD-10: G20Not specifiedMedical record review
Gallo [f]2015[25]1992–2008ItalyEPIC study participantsMean 50 at recruitment5781Hospital: inpatient;Mortality;PrescriptionsICD-9ICD-10ATC/DDD indexPD: ICD-9 332; ATC/DDD index: N04, N04A, N04B; ICD-10 G20Not specifiedMedical record review
Savica2013[35]1991–2005USAPopulation of Olmsted countyUnclearUnclear4957Synthesised medical informationH-ICDAICD-9Parkinsonism: H-ICDA 38 diagnostic codes,ICD-9: 331.9, 332.0, 332.1, 333.0, 333.1, 781.0, 781.3Not specifiedMedical record review
Thacker2016[36]2005–2015USAPatients from a single medical institutionUnclearUnclear129Hospital: inpatient and outpatientPrimary careICD-9PD: 332, 332.0PrimaryMedical record review

Year of study: the time period during which coded data was collected.

*—any information given regarding the ages of cases or age at recruitment. Study size: the total number of code positive cases (true positives plus false positives). Where both PD and parkinsonism were investigated in one article, study sizes for both are displayed. Study population composition: population cohort from which cases were identified.

ICD codes for Parkinson’s disease—ICD-7 350; ICD-8 342.00; ICD-9(-CM) 332.0; ICD-10 G20.

ICD codes for other Parkinsonism—ICD-8: 342.08 (other defined Parkinsonism), 342.09 (unspecified Parkinsonism); ICD-9(-CM): ICD-9-CM: 332.1 (secondary Parkinson’s disease), 333.0 (other degenerative diseases of the basal ganglia); ICD-10: G21.4 (vascular Parkinsonism), G21.8 (other defined secondary Parkinsonism), G21.9 (unspecified secondary Parkinsonism), G23.1 (progressive supranuclear ophthalmoplegia), G23.2 (striatonigral degeneration), G23.9 (unspecified degenerative disease of basal ganglia), G25.9 (unspecified extrapyramidal and movement disorder). Additional ICD codes–ICD-9: 331.9 (cerebral degeneration), 333.1 (essential and other specified forms of tremor), 781.0 (abnormal involuntary movements), 781.3 (lack of coordination).

†Exact study size unknown, reported as 7% of 1521 (could range from 99–115)–authors contacted, but data unavailable.

Abbreviations: PD—Parkinson’s Disease; EPIC—European Prospective Investigation into Cancer and Nutrition study; ICD- International Classification of Diseases; H-ICDA—Hospital Adaptation of ICDA; ATC/DDD index—Anatomical Therapeutic Chemical Classification System with Defined Daily Doses; ICPC—International Classification of Primary Care.

Table 2

Characteristics of studies reporting sensitivity, stratified by dataset type.

First author, year of publicationYear of studyCountryStudy populationcompositionAge (years)*Proportion maleStudy size (n)Routine dataset usedCoding systemCodes used to identify casesDiagnostic coding positionReference standard
Mortality certificate-derived datasets:
Benito-Leόn2014[37]1994–2007SpainThree communities near MadridMean 7756%82MortalityICD-9 (pre 1999)ICD-10 (post 1999)Not specified (investigated PD)PrimaryScreening (in-person, telephone and mail questionnaire) and neurological examination
Beyer2001[38]1993–1996NorwayCounty(Rogaland)Mean 79Unclear84MortalityICD-9 or ICPCNot specified (investigated PD)Primary + AnySemi-structured interview and a clinical examination
Fall2003[39]1989–1998SwedenCentral district of ӦstergӧtlandMean 82Unclear121MortalityICD-9Not specified (investigated PD)Primary + AnyExamination and medical record review
Feldman2012[24]1998–2008SwedenTwins across Sweden >50yrsMean 75UnclearPD: 77Parkinsonism: 127MortalityICD-10PD: G20Parkinsonism: G21.4, G21.8, G21.9, G23.1, G23.2, G23.9, G25.9AnyScreening interview, medical record review and examination
Williams-Gray2013[40]2000–2012UKCounty(Cambridgeshire)Mean 70Unclear63MortalityNot specifiedNot specified (investigated PD)Primary + AnyHistory and neurological examination
Hospital-derived datasets:
Feldman2012[24]1964–2009SwedenTwins across Sweden >50yrsMean 75UnclearPD: 132Parkinsonism: 194Hospital: inpatientICD-7 (1961–67)ICD-8 (1968–86)ICD-9 (1987–96)ICD-10 (1997–2009)PD: ICD-7: 350; ICD-8: 342.00; ICD-9: 332.0; ICD-10: G20Parkinsonism: ICD-8: 342.08, 342.09; ICD-9: 333.0;ICD-10: G21.4, G21.8, G21.9, G23.1, G23.2, G23.9, G25.9AnyScreening interview, medical record review and examination

Year of study: the time period during which coded data was collected.

*—any information given regarding the ages of cases or age at recruitment Study size: the total number of true positive according to the reference standard (true positives and false negatives). Where both PD and parkinsonism were investigated in one article, study sizes for both are displayed. Study population composition: population cohort from which cases were identified.

ICD codes for Parkinson’s disease—ICD-7 350; ICD-8 342.00; ICD-9 332.0; ICD-10 G20.

ICD codes for other Parkinsonism—ICD-8: 342.08 (other defined Parkinsonism), 342.09 (unspecified Parkinsonism); ICD-9: 333.0 (other degenerative diseases of the basal ganglia); ICD-10: G21.4 (vascular Parkinsonism), G21.8 (other defined secondary Parkinsonism), G21.9 (unspecified secondary Parkinsonism), G23.1 (progressive supranuclear ophthalmoplegia), G23.2 (striatonigral degeneration), G23.9 (unspecified degenerative disease of basal ganglia), G25.9 (unspecified extrapyramidal and movement disorder).

Year of study: the time period during which coded data was collected. *—any information given regarding the ages of cases or age at recruitment. Study size: the total number of code positive cases (true positives plus false positives). Where both PD and parkinsonism were investigated in one article, study sizes for both are displayed. Study population composition: population cohort from which cases were identified. ICD codes for Parkinson’s disease—ICD-7 350; ICD-8 342.00; ICD-9(-CM) 332.0; ICD-10 G20. ICD codes for other Parkinsonism—ICD-8: 342.08 (other defined Parkinsonism), 342.09 (unspecified Parkinsonism); ICD-9(-CM): ICD-9-CM: 332.1 (secondary Parkinson’s disease), 333.0 (other degenerative diseases of the basal ganglia); ICD-10: G21.4 (vascular Parkinsonism), G21.8 (other defined secondary Parkinsonism), G21.9 (unspecified secondary Parkinsonism), G23.1 (progressive supranuclear ophthalmoplegia), G23.2 (striatonigral degeneration), G23.9 (unspecified degenerative disease of basal ganglia), G25.9 (unspecified extrapyramidal and movement disorder). Additional ICD codes–ICD-9: 331.9 (cerebral degeneration), 333.1 (essential and other specified forms of tremor), 781.0 (abnormal involuntary movements), 781.3 (lack of coordination). †Exact study size unknown, reported as 7% of 1521 (could range from 99–115)–authors contacted, but data unavailable. Abbreviations: PD—Parkinson’s Disease; EPIC—European Prospective Investigation into Cancer and Nutrition study; ICD- International Classification of Diseases; H-ICDA—Hospital Adaptation of ICDA; ATC/DDD index—Anatomical Therapeutic Chemical Classification System with Defined Daily Doses; ICPC—International Classification of Primary Care. Year of study: the time period during which coded data was collected. *—any information given regarding the ages of cases or age at recruitment Study size: the total number of true positive according to the reference standard (true positives and false negatives). Where both PD and parkinsonism were investigated in one article, study sizes for both are displayed. Study population composition: population cohort from which cases were identified. ICD codes for Parkinson’s disease—ICD-7 350; ICD-8 342.00; ICD-9 332.0; ICD-10 G20. ICD codes for other Parkinsonism—ICD-8: 342.08 (other defined Parkinsonism), 342.09 (unspecified Parkinsonism); ICD-9: 333.0 (other degenerative diseases of the basal ganglia); ICD-10: G21.4 (vascular Parkinsonism), G21.8 (other defined secondary Parkinsonism), G21.9 (unspecified secondary Parkinsonism), G23.1 (progressive supranuclear ophthalmoplegia), G23.2 (striatonigral degeneration), G23.9 (unspecified degenerative disease of basal ganglia), G25.9 (unspecified extrapyramidal and movement disorder). Study size varied considerably, ranging from 39–4957. All 18 articles were based in high-income countries. Three were from the UK[32,33,40], six from mainland Europe[24,25,30,37-39], eight from the USA[26-29,31,34-36], and one from Canada[23]. There were 12 PPV estimates and two sensitivity estimates from hospital data[23-31], two PPV and 10 sensitivity estimates from mortality data[24,37-40], two PPV estimates from primary care data[32], four PPV estimates from prescription data[23,29,33] and seven PPV estimates and two sensitivity estimates from combining datasets from different sources[24,25,34-36]. There were no sensitivity estimates from primary care or prescription data. PD was evaluated in 13 articles, with eight estimating PPV[25,26,28-30,32,33,36], four estimating sensitivity[37-40] and one estimating both[24]. Parkinsonism was evaluated by seven articles, of which six estimated PPV[23,27,31,33-35] and one assessed both PPV and sensitivity[24]. All of the parkinsonism articles combined PD with other causes of parkinsonism. The methods of reference standard used could be broadly divided into two categories: patient history and examination (5/5 articles reporting sensitivity) and medical record review (14/14 of articles reporting PPV). Three articles used in-person examination and medical record review in combination[24,33,39]. In addition, where entire populations were under study, some studies incorporated a screening method (e.g., telephone interview) to identify potential cases[24,37]. Where reported, codes used to identify PD cases were consistent and appropriate to the ICD version used. However, the range of codes used to identify other parkinsonian conditions varied considerably, reflecting the broad range of pathologies that can lead to parkinsonism. Seven studies did not specify the exact codes used[29,32,33,37-40]. ICD versions used reflected the time period over which the studies were conducted. 19 studies used ICD-9 (or ICD-9-CM, a clinically modified version used in the USA, and identical to ICD-9 with respect to parkinsonian diagnoses)[23-29,31,35-39], 11 used ICD-10[23-25,30,37], three used ICD-8[24,30], and two used ICD-7[24]. One of the primary care studies used Read-coded data[32]. Four studies, including the three that evaluated prescription data, did not specify the coding system used[23,29,33,40]. The diagnostic coding position assessed also varied. Three studies assessed primary diagnoses alone[30,36,37], eight used any diagnostic position[24,31,38-40], while 13 did not specify the coding position[23,25-29,34,35]. Diagnostic position was not applicable in the studies of primary care and prescription data due to the nature of these datasets[23,29,32,33]. Only two articles were judged to be of low risk of bias or applicability concerns in the QUADAS-2 assessment[23,24] (S1 Table). Across the risk of bias domains, the most common area of concern was inappropriate or unclear code lists to identify disease cases (10/18), followed by: selection bias (8/18), patient flow (i.e. inappropriate inclusions and exclusions or patients being lost to follow-up) (5/18) and insufficiently rigorous or unclear reference standards (4/18).

Positive predictive value

For PD, there were 17 PPV estimates in total (Fig 2)[24-26,28-30,32,33,36]. These comprised seven PPV estimates of hospital data alone[24-26,28-30], one of mortality data alone[24], two for prescription data alone[29,33], one of primary care data alone[32], one of prescription data and primary care data in combination[32], and five of datasets used in combination[25,36]. PPVs ranged from 36–90% across all studies. Nine of the 17 estimates were >75%. The single study of Read coding in primary care data alone reported a PPV of 81%, increasing to 90% with the presence of a relevant medication code in addition to a diagnostic code[32]. The two studies of medication data alone reported PPVs of 53% and 87%[29,33]. The single, small study of mortality data had a PPV of 67%[24].
Fig 2

Positive predictive values (PPVs) of coded diagnoses.

Study size: total number of code-positive cases (true positives + false positives). *Exact sample size unknown, most conservative estimate used. Box sizes reflect Mantel-Haenszel weight of study (inverse variance, fixed effects).

Positive predictive values (PPVs) of coded diagnoses.

Study size: total number of code-positive cases (true positives + false positives). *Exact sample size unknown, most conservative estimate used. Box sizes reflect Mantel-Haenszel weight of study (inverse variance, fixed effects). One of the two articles judged to be at low risk of bias investigated the PPV of hospital admissions data to identify PD, reporting a PPV of 70.8%[24]. This value fell in between the range of other studies (range 55.5–90.3%), raising the possibility that estimates from studies at the extremes of the range may be influenced by bias. Several within-study comparisons were available from three studies identifying PD (Table 3)[24,28,29]. Two of these investigated the change in PPV for hospital data to identify PD when algorithms containing additional criteria were used[24,28]. Both showed a moderate increase in PPV if a relevant diagnosis code was recorded more than once, or if a specialist department assigned such a code. One study reported an increase in PPV when only primary position diagnoses were assessed[24]. Another showed that incorporating selected medication codes with diagnosis codes increased the PPV from 76% to 86%, although this was at the expense of reduced case ascertainment[28]. Finally, one study showed that the combination of a diagnostic code in hospital data with a relevant medication code increased the PPV when compared to using either dataset alone (94% versus 87% and 89% respectively)[29].
Table 3

Within-study analyses.

Algorithm development.

Criteria applied:PPV % (95% CI)Number of casesidentified
Parkinson’s Disease
a) Feldman 2012 (hospital inpatient data)
Parkinson’s disease ICD code only71 (59–81)72
Exclusion of patients with other (non-Parkinson’s disease) parkinsonian codes70 (58–81)67
Code frequency ≥2 hospital admissions76 (61–88)42
Code in primary diagnostic position83 (70–92)53
Code assigned in specialist department (neurological/neurosurgical/geriatric)83(63–95)24
b) Szumski 2009 (hospital outpatient data)
Parkinson’s disease ICD codes only76 (72–79)579
Code frequency ≥2 at any clinic79(76–83)409
Code assigned in any neurology clinic79 (75–83)352
Code assigned in movement disorder speciality clinic87 (81–92)177
Code + prescribed antiparkinsonian medication86 (82–89)408
c) Wei 2016
Parkinson’s disease ICD codes only89 (81–94)100
Prescription only87 (78–93)100
ICD code and prescription94*Unknown*
Parkinsonism
d) Butt 2014
Hospital inpatient ICD code ever87 (79–96)63
Hospital outpatient ICD code ever55 (49–60)297
Prescription ever40 (35–44)395
Outpatient code frequency ≥2 in one year83 (77–89)169
Outpatient code frequency ≥2 in one year by a specialist87 (81–92)134
Outpatient code AND Prescription85 (79–90)174
Prescription AND outpatient code within +/- 6 months87 (82–92)166

The effect of additional criteria to identify PD cases on PPV and the number of cases identified.

* Sample size and confidence intervals unknown for this accuracy measure.

Within-study analyses.

Algorithm development. The effect of additional criteria to identify PD cases on PPV and the number of cases identified. * Sample size and confidence intervals unknown for this accuracy measure. For parkinsonism, there were 10 PPV estimates in total (Fig 2)[23,24,27,31,33-35]. These comprised five estimates from hospital data alone[23,24,27,31], two from prescription data alone[23,33], one from mortality data alone[24], and two from using datasets in combination[34,35]. PPVs ranged from 40–94% in the single datasets and from 22–28% in the combination datasets. The two studies of parkinsonism in prescription data produced very different PPV estimates of 40% and 74%[23,33]. One of these studies reported that the PPV of medication data to identify any parkinsonian disorder was considerably higher than that for PD (74% and 53% respectively)[33]. The two articles with low risk of bias investigated the use of hospital admissions data to identify parkinsonism cases. These articles reported PPVs of 76%[23] and 88%[24], which is consistent with the values reported by other studies judged to be at risk of bias.

Sensitivity

For PD, there were 11 sensitivity estimates in total (Fig 3)[24,37-40]. Of these, nine were sensitivity estimates for mortality data alone, consistently showing that codes in the primary position only gave low sensitivities of 11–23%, rising to 53–60% when codes from any position were included[24,37-40]. A single study reported the sensitivity of hospital data to be 73%, increasing to 83% when hospital and mortality data were combined. There were no sensitivity estimates for primary care or prescription data.
Fig 3

Sensitivity estimates of coded diagnoses.

Study size: total number of true positives according to reference standard (true positives + false negatives). *Unknown sample size and confidence intervals. Box sizes reflect Mantel-Haenszel weight of study (inverse variance, fixed effects).

Sensitivity estimates of coded diagnoses.

Study size: total number of true positives according to reference standard (true positives + false negatives). *Unknown sample size and confidence intervals. Box sizes reflect Mantel-Haenszel weight of study (inverse variance, fixed effects). Of the two studies with low risk of bias, one investigated the sensitivity of mortality data, reporting a value of 20%. This was similar to the values reported by other studies deemed at risk of bias, suggesting that the potential bias identified did not significantly affect these estimates. For parkinsonism, there were three sensitivity estimates, all from one study[24]. Hospital admissions and mortality data combined gave higher sensitivity (71%) compared with either mortality or hospital data alone (43% and 63% respectively).

Discussion

We have demonstrated that existing validation studies show a wide variation in the accuracy of routinely collected healthcare data for the identification of PD and parkinsonism cases. Despite this, in some circumstances, achieving high PPVs is possible. Sensitivity (range 15–73% for PD) is generally lower than PPV (range 36–90%) in single datasets, but is increased by combining data sources. When using routinely collected datasets to identify disease cases, there will inevitably be a trade-off between PPV and sensitivity[16]. The extent to which cohorts seek to maximise one accuracy metric over another will depend on the specific study setting and research question. For example, for studies that rely only on routinely collected data to identify disease cases are likely to desire a high PPV, providing sensitivity is sufficient to ensure statistical power in analyses. In contrast, for studies that use routinely collected data to identify potential cases before going onto validate these cases with a more detailed in-person or medical record review, a high sensitivity will be important. In this review, we found that the sensitivity of mortality data to detect PD using codes in the primary position alone was very low (range 11–23%) however, this markedly improved (range 56–60%) when codes were selected from any position on the death certificate[24,37-40]. No studies in this review investigated the effect of coding position on PPV, but previous studies of dementia and motor neurone disease have shown that selecting cases for whom the disease code was in the primary position consistently led to increased PPVs compared to selecting disease codes from any position[41-44]. However, as with PD, this approach led to the identification of fewer cases, thereby reducing sensitivity[17,18]. The pharmacological treatment of PD is largely focussed on improving motor function and patients are treated with a limited number of drugs. This has allowed antiparkinsonian drugs to be used as ‘tracers’ in epidemiological studies[45,46]. There are potential problems with using prescription data as a proxy for PD diagnosis. This approach may disproportionately under-identify patients with early stage disease who do not yet require treatment. Also, a response to a trial of dopaminergic drugs may be used as part of the diagnostic assessment in potential PD cases, meaning some patients prescribed antiparkinsonian medications will not be subsequently diagnosed with PD. Furthermore, antiparkinsonian can be prescribed for indications other than PD (such as dopamine agonists for restless legs syndrome, endocrine disorders and other forms of parkinsonism). The specific drugs licensed for use in parkinsonian conditions varies between countries and may change over time. Therefore, an algorithm incorporating prescription data would need to be continually revised to match prescribing patterns. Results from our review suggest that prescription data alone has a low PPV for PD case ascertainment[33]; however, when drug codes are combined with diagnostic codes, PPV increases but with reduced case ascertainment[28,32]. Furthermore, prescription datasets appear to have a higher PPV when identifying any parkinsonian disorder rather than specifically PD[33]. This study has several strengths and limitations. Our review benefits from prospective protocol publication, comprehensive search criteria, and independent duplication of each stage by two authors. Despite this, relevant studies may still have been missed, especially if a validation study was a subsection of a paper with a wider aim. As all eligible studies were included, the results may have been influenced by studies of lower quality. Only two articles were found to be at low risk of bias or applicability concerns[23,24], and it is likely that biases in study design would have affected the results. For example, one study with the lowest PPV[35] used very broad ICD-9 codes such as 781.0 (abnormal involuntary movements) and 781.3 (lack of coordination). Since there is no method of diagnosing PD with certainty in life, there is likely to be some misclassification of the reference standards used in the studies. The application of stringent diagnostic criteria to reference standard diagnoses, although often necessary for research purposes, may lead to some patients being misclassified as ‘false positives’ when they do in fact have the condition. This may lead to underestimation of the PPV in some of the studies. When considering the ideal reference standard for validation studies, there is a trade-off between the robustness of the reference standard and validating sufficient cases to produce precise accuracy estimates. For example, in-person neurological examination may have greater diagnostic certainty than medical record review but this becomes difficult as the cohort size increases. Some of the variation in the reported results, therefore, is likely to be due to differences in how stringently different studies applied their reference standards. Many of the studies reported cases with insufficient information to meet the reference standard and the handling of these varied. Some studies excluded such cases, others classified them as false positives, while some did not specify how they handled such missing data. Excluding such cases may introduce selection bias, whereas counting them as false positives may underestimate PPV. The effect of possible publication bias on the results is difficult to estimate, but disproportionate publication of studies which report more favourable accuracy measures may lead to over-estimation of the performance of the codes. In addition, estimates of PPV are dependent upon the prevalence of the condition in the study population but it was not possible to assess the prevalence of PD within each study population. Our review highlights several areas requiring further research. Given that the management of PD is largely delivered in outpatients or the community, primary care data may be an effective method of identifying cases. Whilst studies have suggested that PD diagnoses made in primary care are less accurate than those made in a specialist setting[47,48], primary care records combine notes made by primary care clinicians with prescription records and correspondence from secondary care. Codes from primary care should therefore include diagnoses made by specialists, thus increasing their accuracy. We found only one small study of primary care data, reporting a promising PPV of 81%, improving to 90% with the inclusion of medication codes[32]. No studies investigated the sensitivity of primary care data. Further research into the accuracy of primary care data is needed. Two studies investigated using algorithmic combinations of codes from different sources to improve PPV[24,28]. These investigated the additional benefit of the inclusion of factors such as only including codes that appeared more than once, selecting codes in the primary position only, combining diagnostic codes with prescription data, and only including diagnoses made in specialist clinics. These methods increased PPV but at a cost to the number of cases identified. The development of algorithms that maximize PPV whilst maintaining a reasonable sensitivity (e.g., by combining multiple complimentary datasets) merits further evaluation. To our knowledge, no studies have evaluated the accuracy of routinely collected healthcare data for solely identifying atypical parkinsonian syndromes such as PSP and MSA. Further work is needed to understand whether these datasets provide a valuable resource for studying these less common diseases. In conclusion, our review summarises existing knowledge of the accuracy of routinely collected healthcare data for identifying PD and parkinsonism, and highlights approaches to increase accuracy and areas where further research is required. Given the wide range of observed results, prospective cohorts should perform their own validation studies where evidence is lacking for their specific setting.

PRISMA checklist.

(DOC) Click here for additional data file.

Search strategy.

(DOCX) Click here for additional data file.

QUADAS-2 assessment.

(DOCX) Click here for additional data file.

QUADAS-2 summary results.

(DOCX) Click here for additional data file.
  44 in total

1.  The accuracy of Medicare's hospital claims data: progress has been made, but problems remain.

Authors:  E S Fisher; F S Whaley; W M Krushat; D J Malenka; C Fleming; J A Baron; D C Hsia
Journal:  Am J Public Health       Date:  1992-02       Impact factor: 9.308

2.  Causes of death in a community-based study of Parkinson's disease.

Authors:  M K Beyer; K Herlofson; D Arsland; J P Larsen
Journal:  Acta Neurol Scand       Date:  2001-01       Impact factor: 3.209

3.  Incidence and distribution of parkinsonism in Olmsted County, Minnesota, 1976-1990.

Authors:  J H Bower; D M Maraganore; S K McDonnell; W A Rocca
Journal:  Neurology       Date:  1999-04-12       Impact factor: 9.910

4.  Accuracy of diagnosis in patients with presumed Parkinson's disease.

Authors:  J Meara; B K Bhowmick; P Hobson
Journal:  Age Ageing       Date:  1999-03       Impact factor: 10.668

5.  A prospective study of alcoholism and the risk of Parkinson's disease.

Authors:  Miguel A Hernán; Giancarlo Logroscino; Luis A García Rodríguez
Journal:  J Neurol       Date:  2004-10       Impact factor: 4.849

6.  Accuracy of Parkinson's disease diagnosis in 610 general practice patients in the West of Scotland.

Authors:  Edward J Newman; Kieran Breen; Jim Patterson; Donald M Hadley; Katherine A Grosset; Donald G Grosset
Journal:  Mov Disord       Date:  2009-12-15       Impact factor: 10.338

Review 7.  Disease Modification in Parkinson's Disease: Current Approaches, Challenges, and Future Considerations.

Authors:  Anthony E Lang; Alberto J Espay
Journal:  Mov Disord       Date:  2018-04-11       Impact factor: 10.338

8.  Identifying incident cases of parkinsonism among veterans using a tertiary medical center.

Authors:  Donna White; Suzanne Moore; Stephen Waring; Karon Cook; Eugene Lai
Journal:  Mov Disord       Date:  2007-05-15       Impact factor: 10.338

9.  UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age.

Authors:  Cathie Sudlow; John Gallacher; Naomi Allen; Valerie Beral; Paul Burton; John Danesh; Paul Downey; Paul Elliott; Jane Green; Martin Landray; Bette Liu; Paul Matthews; Giok Ong; Jill Pell; Alan Silman; Alan Young; Tim Sprosen; Tim Peakman; Rory Collins
Journal:  PLoS Med       Date:  2015-03-31       Impact factor: 11.069

10.  The German National Cohort: aims, study design and organization.

Authors: 
Journal:  Eur J Epidemiol       Date:  2014-05-20       Impact factor: 8.082

View more
  8 in total

1.  Selective serotonin reuptake inhibitors and the risk of restless legs syndrome: a symmetry analysis.

Authors:  Ann-Cathrine Dalgård Dunvald; Daniel Pilsgaard Henriksen; Jesper Hallas; Mette Marie Hougaard Christensen; Lars Christian Lund
Journal:  Eur J Clin Pharmacol       Date:  2020-02-16       Impact factor: 2.953

2.  Alcohol intake and Parkinson's disease risk in the million women study.

Authors:  Iris Y Kim; TienYu Owen Yang; Alicia K Heath; Rachel F Simpson; Gillian K Reeves; Jane Green; Sarah Floud; Anna Brown; David J Hunter; Valerie Beral; Siân Sweetland
Journal:  Mov Disord       Date:  2019-11-26       Impact factor: 10.338

3.  Validation of diagnostic codes and epidemiologic trends of Huntington disease: a population-based study in Navarre, Spain.

Authors:  Esther Vicente; Ainara Ruiz de Sabando; Fermín García; Itziar Gastón; Eva Ardanaz; María A Ramos-Arroyo
Journal:  Orphanet J Rare Dis       Date:  2021-02-10       Impact factor: 4.123

4.  Representation of Parkinson's disease and atypical Parkinson's syndromes in the Czech Republic-A nationwide retrospective study.

Authors:  Jiří Búřil; Petra Búřilová; Andrea Pokorná; Ingrid Kováčová; Marek Baláž
Journal:  PLoS One       Date:  2021-02-02       Impact factor: 3.240

5.  Aluminum dust exposure and risk of neurodegenerative diseases in a cohort of male miners in Ontario, Canada.

Authors:  Xiaoke Zeng; Jill Macleod; Colin Berriault; Nathan L DeBono; Victoria H Arrandale; Anne M Harris; Paul A Demers
Journal:  Scand J Work Environ Health       Date:  2021-09-30       Impact factor: 5.024

Review 6.  ADHD and Neurodegenerative Disease Risk: A Critical Examination of the Evidence.

Authors:  Sara Becker; Manu J Sharma; Brandy L Callahan
Journal:  Front Aging Neurosci       Date:  2022-01-25       Impact factor: 5.750

7.  Developing automated methods for disease subtyping in UK Biobank: an exemplar study on stroke.

Authors:  Kristiina Rannikmäe; Honghan Wu; Steven Tominey; William Whiteley; Naomi Allen; Cathie Sudlow
Journal:  BMC Med Inform Decis Mak       Date:  2021-06-15       Impact factor: 2.796

8.  Identifying incident Parkinson's disease using administrative diagnostic codes: a validation study.

Authors:  Brett J Peterson; Walter A Rocca; James H Bower; Rodolfo Savica; Michelle M Mielke
Journal:  Clin Park Relat Disord       Date:  2020-06-02
  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.