Literature DB >> 32193270

Linking administrative data sets of inpatient infectious diseases diagnoses in far North Queensland: a cohort profile.

Damon P Eisen1,2, Emma S McBryde3, Luke Vasanthakumar1, Matthew Murray4, Miriam Harings1, Oyelola Adegboye5.   

Abstract

PURPOSE: To design a linked hospital database using administrative and clinical information to describe associations that predict infectious diseases outcomes, including long-term mortality. PARTICIPANTS: A retrospective cohort of Townsville Hospital inpatients discharged with an International Classification of Diseases and Related Health Problems 10th Revision Australian Modification code for an infectious disease between 1 January 2006 and 31 December 2016 was assembled. This used linked anonymised data from: hospital administrative sources, diagnostic pathology, pharmacy dispensing, public health and the National Death Registry. A Created Study ID was used as the central identifier to provide associations between the cohort patients and the subsets of granular data which were processed into a relational database. A web-based interface was constructed to allow data extraction and evaluation to be performed using editable Structured Query Language. FINDINGS TO DATE: The database has linked information on 41 367 patients with 378 487 admissions and 1 869 239 diagnostic/procedure codes. Scripts used to create the database contents generated over 24 000 000 database rows from the supplied data. Nearly 15% of the cohort was identified as Aboriginal or Torres Strait Islanders. Invasive staphylococcal, pneumococcal and Group A streptococcal infections and influenza were common in this cohort. The most common comorbidities were smoking (43.95%), diabetes (24.73%), chronic renal disease (17.93%), cancer (16.45%) and chronic pulmonary disease (12.42%). Mortality over the 11-year period was 20%. FUTURE PLANS: This complex relational database reutilising hospital information describes a cohort from a single tropical Australian hospital of inpatients with infectious diseases. In future analyses, we plan to explore analyses of risks, clinical outcomes, healthcare costs and antimicrobial side effects in site and organism specific infections. © Author(s) (or their employer(s)) 2020. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.

Entities:  

Keywords:  data-linkage; epidemiology; hospital; infectious diseases; relational database

Mesh:

Year:  2020        PMID: 32193270      PMCID: PMC7202725          DOI: 10.1136/bmjopen-2019-034845

Source DB:  PubMed          Journal:  BMJ Open        ISSN: 2044-6055            Impact factor:   2.692


The linked database will serve as a basis for future studies unique to tropical Australia of incidence, risk factors and clinical outcomes of patients with hospital admissions involving infectious diseases. The incorporation of pathology results in the cohort will allow precise characterisation of many infectious diseases. The patient cohort was based on data sets from a single hospital, findings might not be generalisable to the Australian population. The validity of cohort studies rely on the accuracy of clinical coding; therefore, some important clinical information may be underrepresented.

Introduction

Deriving a broad and detailed understanding of the epidemiology of infectious diseases is crucial as they are a common cause of admissions to hospitals and frequent cause of hospital complications. In 2016–2017, 7.2 per 1000 of Australia’s population were hospitalised with a primary diagnosis of an infectious disease.1 The rate in Australia’s Indigenous population was double this. Of the principal causes of hospitalisation, pneumonia was fourth, cellulitis ninth and ‘other sepsis’ 16th. Regrettably, 103 000 patient episodes (1.2% of all hospital separations) involved a hospital-acquired infection. Urinary tract infection, pneumonia and blood stream infection are the third to fifth most common hospital-acquired complications. These infections contribute to the marked increase in the average length of stay (17 vs 4.4 days)1 and may increase mortality.2 Patterns of mortality for various illnesses, chronic and acute, are documented by the Australian Institute of Health and Welfare. Infectious and parasitic diseases (narrowly defined) are relatively infrequent single causes of mortality (<3%).3 However, more commonly, they are contributors to multiple causes of death in patients with chronic conditions. For instance, pneumonia and influenza are particularly common causes of death in patients with dementia. Currently, there exists an opportunity to reutilise large amounts of data collected for administrative and routine clinical purposes to derive a more detailed picture of the incidence of diseases in Australian hospitals.4 Data-linkage processes are a powerful tool for analysis of various disease cohorts. These are a value-adding re-use of previously acquired patient information that represents a rich research resource. We have developed a database that will be used in the future to analyse the incidence, risk factors and clinical outcomes of patients with hospital admissions involving infectious disease.

Cohort description

Setting

The Townsville Hospital is the tertiary referral centre for North Queensland, providing specialist care for 670 000 people. Townsville is located at 19.26° S and has a ‘dry tropics’ climate with a mean rainfall of 1100 mm.

Cohort selection

A cohort of Townsville Hospital inpatients was identified based on International Classification of Diseases and Related Health Problems 10th Revision Australian Modification (ICD-10-AM) discharge codes for an infectious disease. The cohort spanned for the 11-year period from 1 January 2006 to 31 December 2016. Information from the episode of care that led to cohort inclusion and all previous and subsequent inpatient admissions was provided. The ICD-10-AM codes primarily used to select the patient cohort were infectious and parasitic diseases (A00–B99) (online supplementary table S1). However, for completeness, selected infection-related codes were also included from: Diseases of the nervous system G* describing intracranial infection. Diseases of the eye, ear and mastoid process H* describing intraocular and ear infection. Diseases of the circulatory system I* describing cardiac infections. Diseases of the respiratory system J* describing upper and lower respiratory tract infections. Diseases of the digestive system K* describing intra-abdominal infections. Diseases of the skin and subcutaneous tissues L* describing skin and soft tissue infections. Diseases of the musculoskeletal system and connective tissue M* describing infections of the bony skeleton and muscles. Diseases of the genitourinary system N* describing urinary tract infections. Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified R* describing fever of unknown origin and shock among others.

Databases

The following key data relating to the selected cohort were provided with the approval of Queensland Government Data Custodians: Queensland Health Admitted Patient Data Collection (QHAPDC): patient demographics, Indigenous status, principal and other diagnoses ICD-10-AM codes, procedure codes using Australian Classification of Health Interventions, length of stay and hospital separation. Admitted patient clinical coding is regulated by National Australian Coding Standards and QHAPDC data quality is managed via systematic internal audit, the State Government Queensland Audit Office and through periodic external audits. Date, primary and secondary causes of death over the 11-year study period. Emergency data collection: triage category, principal and other diagnoses. Pathology: results for; general microbiology, infective serology testing, infective PCR testing; haematology, full blood examination, coagulation; biochemistry results, urea and electrolytes, liver function tests, C-reactive protein. Antimicrobial dispensing: ipharmacy (central pharmacy dispensing) and Pyxis (ward dispensing); dose, date and price of selected anti-infective drug dispensing. Notifiable Conditions System: type and site of infection.

Data linkage

Extracted patient information was identifiable by the Medical Records Number. This was used by the Health Statistics Branch of Queensland Health to perform data-linkage processes described in the Queensland Data Linkage Framework. Anonymised data, identified by a unique Created Study ID, were provided to the research team.

Database construction

The data were supplied variously as comma or tab delimited text or as spreadsheet documents, and was processed into a relational database. The Created Study ID (PU_ID) was used as the central identifier to provide associations between the cohort patients and the subsets of granular data. A web-based interface was constructed to allow data extraction and evaluation to be performed using either editable Structured Query Language or a selection of preset queries. The script and analysis interface were written in PHP/MySQL using a text editor.

Data analysis

Patient data extracts for analysis were imported into SAS V.9.4. Descriptive summaries are presented as frequencies and percentages for categorical variables, and means, quartiles and SDs for continuous variables. Charlson Comorbidity Index (CCI)5 6 was used to rank patient illness severity based on the number and importance of comorbid diseases (online supplementary table S2).

Patient and public involvement statement

Patients or members of the public were not involved in the development and design of the research. The anonymised data extraction does not require patient recruitment.

Results

Cohort profile and database characteristics

The database consisted of linked information from 41 367 patients with 378 487 admissions and 1 869 239 diagnostic or procedure codes. The ICD-10-AM codes for infectious diseases that were used to select patients for inclusion in the cohort are listed in online supplementary table S1. A summary of the data and the datafields is included in online supplementary table S2. The individual datafields are listed in online supplementary table S3. The ICD-10-AM codes used to identify comorbidities are listed in online supplementary table S4. A database structure was designed to best accommodate the contents of the supplied data and the available identifiers within it. Its relational structure is shown in figure 1. The resulting relational structure was designed to provide total freedom to retrieve grouped patient information from all the component sources as a single data set.
Figure 1

Representation of relational database constructed showing links between fields from incorporated administrative, clinical and death registry information. (see online supplementary table S1 for detailed description of fields).

Representation of relational database constructed showing links between fields from incorporated administrative, clinical and death registry information. (see online supplementary table S1 for detailed description of fields). The database contents were created using a variety of purpose-built scripts to process, reshape and clean the data. These scripts generated over 24 000 000 database rows from the supplied data. The Created Study ID (PU_ID) was used as the central identifier to provide associations between the cohort patients and the data subsets. Some assumptions were made during the processing of data. If pathology results were entered during the same date and time range as an admission, then this was included as part of the admission even though no admission identifier was available in the pathology data set. Much of the collected data was entered as free text and preset values were inconsistently provided across different entry systems, resulting in variations in the expression of the same values. Scripts were written to standardise these results, extracting quantifiable values where possible. For example, the birth date of each person was not reliably supplied and the maximum detail was extracted from various data sources. Some sources using the same PU_ID recorded the age inconsistently at a certain admission date, others had birth month and day, and others incorporated full birth dates. The scripts analysed and prioritised each of these and consolidated all available information for each of 41 367 people. The year of birth was successfully generated for every person. Additionally, the ICD-10-AM codes were not consistently entered. For example, ‘A064’ was entered but the correct format is ‘A06.4’. Each was analysed, broken down into its components and entered into the database. For 1130 of the 8274 deaths, principal and other causes of death were listed as free text not ICD-10-AM codes. Causes of these deaths were coded manually. Summary statistics are presented to give a basic description of the cohort (table 1). The distribution of age at first admission was skewed towards older subjects. Similarly, the total number of admissions was markedly skewed towards higher values. This is due to the significant number of haemodialysis patients who had a median of six admissions with IQR of 2–41 over the 11-year duration of the cohort study. A large proportion of the patients identified as Indigenous (14.88%). Of interest, 4.5% of patients in this cohort were admitted to the Townsville Hospital from correctional facilities and Indigenous peoples are overrepresented among these patients compared with the cohort as a whole. The overall 11-year all-cause mortality was 20%. A high proportion of patients smoked (44%). Other major modifiable risk factors included alcohol abuse, obesity and malnutrition (table 1).
Table 1

Cohort characteristics (n=41 367)

CharacteristicsNoMeanMedianSDQ1Q3
Age (years) at first admission41 36743.154924.442668
Total admissions378 4879.15260.7415

*Aboriginal and Torres Strait Islander.

Cohort characteristics (n=41 367) *Aboriginal and Torres Strait Islander. This patient cohort had a moderately low burden of comorbidity with an average CCI score of 1.86 (IQR, 0–3). About 16% had a CCI of 5 and above. The major comorbidities are diabetes, cancer and renal disease. Other common comorbidities were chronic pulmonary disease, cerebrovascular disease and myocardial infarction. Multiple comorbidities were present in 67% of patients (table 2).
Table 2

Major comorbidities and Charlson Comorbidity Index

Major comorbiditiesn%
Myocardial infarction30427.35
Peripheral vascular disease18624.50
Cerebrovascular disease32947.96
Heart failure17544.24
Dementia7901.91
Chronic pulmonary disease514012.42
Rheumatic disease4751.15
Peptic ulcer disease5681.37
Mild liver disease19924.82
Moderate or severe liver disease6121.48
Diabetes without chronic complication513112.40
Diabetes with chronic complication510212.33
Hemiplegia or paraplegia19074.61
Renal disease741917.93
Any malignancy, including lymphoma and leukaemia, except malignant neoplasm of skin560213.54
Metastatic solid tumour12032.91
AIDS/HIV1140.26
Charlson Comorbidity Index (CCI)
 None: CCI score (0)21 21551.28
 Mild: CCI score (1–2)827019.99
 Moderate: CCI score (3–4)549215.28
 Severe: CCI score (5+)639015.45
 Median (IQR)0 (0–3)
 Mean (SD)1.86 (2.72)
Major comorbidities and Charlson Comorbidity Index The geographic location of patient domicile as determined by postcode at the time of inpatient registration and numbers of patients per 100 000 resident in the Local Government Area are shown in figure 2. The majority of cohort patients resided in the Townsville Local Government Areas.
Figure 2

Heat map of cohort patients per 100 000 shown by postcode of domicile according to hospital registration at entry into cohort.

Heat map of cohort patients per 100 000 shown by postcode of domicile according to hospital registration at entry into cohort. Table 3 lists common infectious diseases diagnoses along with others of note in the tropical setting of Townsville Hospital. These diagnoses represent aggregated codes that describe infection due to the same pathogen or the same site. Multiple codes often describe infection of the same organ. For common conditions such as Staphylococcus aureus (A41), urinary tract infection (N39.0) and influenza and pneumonia (J09–J18), many diagnoses are coded as ‘other’. Precise study of these conditions, other microbial or organ specific infectious disease will require disaggregation of codes and incorporation of the available pathology results.
Table 3

Total cases of diseases due to selected microbial pathogens

DiseasesN
Staphylococcus aureus sepsis6802
 Skin and soft tissue infection3182
 Osteomyelitis670
 Arthritis215
 Phlebitis and thrombophlebitis250
 Infective endocarditis172
Streptococcus pyogenes infection1197
 Skin and soft tissue infection693
Streptococcus pneumoniae sepsis515
 Pneumonia435
Urinary tract infection
 Pyelonephritis1391
 Cystitis314
 Urethritis22
 Prostatitis118
 Abscess52
 Other9083
Pneumonia
 Viral769
 Bacterial2853
 Other4151
Influenza1738
Meningitis
 Viral240
 Bacterial123
Tropical infection
 Melioidosis84
 Dengue88
 Ross River48
 Q fever139
Total cases of diseases due to selected microbial pathogens

Discussion

This longitudinal cohort study describes patients discharged from the largest tertiary referral hospital in the tropical region of Australia with an infectious disease diagnosis. The infectious diseases included in this cohort represent an exhaustive list of conditions prevalent in Northern Australia as well as in Australian communities in general. When we consider the patterns of infectious diseases found in this cohort, S. aureus was the most common pathogen identified followed by influenza and Group A streptococcus. Skin and soft tissue was the most common site of infection followed by the respiratory tract. Future analysis of patient factors associated with mortality is underway. These data will allow comparison with other mortality data from Australian studies of infectious diseases. All-cause mortality rates from Australian cohorts of patients with selected, highly morbid, infections such as S. aureus bacteraemia (28%, 2–5 year follow-up),7 community-acquired pneumonia (60.4%, mean follow-up 6.1 years)8 and infective endocarditis (14.7%, 1–5 year follow-up) have been described.9 These studies all demonstrated increased all-cause mortality of the infectious diseases cohorts compared with controls. This cohort will allow a wide range of future analyses on the epidemiology of severe infection in patients of the largest tertiary referral hospital in Northern Australia. Its size and complexity makes it a valuable resource. The variety of data that are incorporated allow for nuanced study of inpatients discharged with an infectious diagnosis. For example, linkage of microbiological, haematological and biochemical provides the opportunity to correlate numerous laboratory parameters with disease outcomes. Emergency department data will facilitate assessment of the numbers of hospital presentations made prior to a diagnosis such as cryptococcal meningitis. In a recent study based on a cohort of inpatients with pnuemonia extracted from this data linkage, we found an immediate increase in risk of pneumonia associated with exposure to moderate low temperatures in late winter and early summer.10 There has been a sustained increase in the numbers of cohort studies using linked administrative hospital data sets, including in Australia.11 However, infectious diseases studies are in the minority compared with cardiovascular, health services, cancer and maternal health research. Australian cohort studies that use data linkage to describe infectious diseases mostly rely on ICD-10-AM diagnostic codes and death registry information. Some also incorporate notifiable diseases data12 but, overall, studies incorporating pathology data are few.13 14 Regrettably, in Australian jurisdictions, pathology data are only available for data linkage in Western Australia and Queensland due to their statewide diagnostic laboratories.4 Data-linkage studies incorporating pathology data have tested the precision of infectious diseases diagnosis in comparison with public health communicable diseases notifications systems15 and hospital discharge coding.13 These studies both demonstrated underascertainment of childhood respiratory tract diseases. Australian infectious diseases cohort studies have involved: organ specific infections such as respiratory viral infections,13 infections such as Q fever12 and S. aureus bacteraemia14 as well as specific patients such as asplenics16 and haematology–oncology.17 The value of Australian patient cohorts for infectious diseases research is further shown by the multiple studies deriving from the 45 and up study of ageing,18 Triple I Western Australian birth cohort15 and Victorian Post-Splenectomy Registry.19 There are inherent limitations of retrospective databases defined by ICD-10-AM codes. Some important clinical information is underrepresented. This is exemplified in this cohort study where only 3.95% of patients were coded as being obese. By contrast, among the general Australian population, as measured in 2017–2018, 31% of adults and 8.6% of children and adolescents were obese.20 This inpatient underestimate may derive from ICD-10-AM coding for obesity only being allocated where active assessment is made by a dietitian for obesity. Inpatients at the Townsville Hospital were more frequently diagnosed (11.13%) with malnutrition reflecting documentation of clinical interventions. The administrative databases used to construct this linked database predated use of an electronic medical record at Townsville Hospital. Machine learning is being used in research settings to analyse free text in clinical notes and diagnostic imaging reports.21 However, owing to absence of free text data, we are unable to apply this methodology to our database. The absence of this clinical information may diminish the ability to determine precise case definitions and important comorbidities such as obesity. Despite these potential limitations, ICD-10-AM codes for infectious diseases have been shown to be closely correlated with clinical diagnoses determined after medical chart review in Australian research, for example, in two studies of community-acquired pneumonia.22 23 Linked administrative data was shown to reliably ascertain incident colorectal and lung cancer diagnoses when compared with the New South Wales Cancer Registry.24 Other Australian researchers have studied the accuracy of ICD-10-AM codes for diagnoses of childhood influenza and pertussis.25 While demonstrating high specificity and positive predictive value, the authors conclude that addition of laboratory data increases the precision of retrospective, population level diagnosis of paediatric respiratory infection. The incorporation of pathology results in the cohort described in this database will allow precise characterisation of the infectious diseases cohort we have assembled. For example, the large volume of microbiology data will allow for analysis of key areas such as antimicrobial resistant infections and their influence on clinical outcomes and provide greater precision for diagnosis (eg, site of infection in sepsis).

Conclusions

Numerous analysis of risks for, and outcomes of, disease and organism-specific infections, healthcare costs and antimicrobial side effects will all be undertaken in the future using these data. These studies will incorporate measures such as the Socio-Economic Index for Areas26 to assess the impact of socioeconomic disadvantage on outcomes of infectious diseases occurring in hospitalised patients. As hospitalisation data are available before the admission that led the patient to be included in the cohort, there will be an opportunity to assess presentations and investigation findings that predated diagnosis. Similarly, the extensive information from subsequent hospitalisations will allow detailed analysis of long-term health effects after severe infectious diseases. The use of linked pathology data may retrospectively improve definition of severe infectious diseases such as invasive group A streptococcal infection by a systematic search for positive cultures from sterile sites.

Strengths and limitations of this study

The main strength of this cohort is its large size and unique description of inpatients diagnosed with infectious diseases at an Australian tropical zone hospital. The intricate relational database has provided a resource that can be easily searched. In future analyses, the linkage of numerous data sources to provide a granular description of patient disease and treatment will enable the use of a variety of statistical methods. Similarly, pathology and pharmacy antimicrobial dispensing data availability allows for precise case definition and analysis of treatment response. The main study limitations are that it is based on data sets from a single hospital so future findings will not be applicable to the general Australian population and the validity of cohort studies rely on the accuracy of clinical coding. Despite these limitations, this database will be a rich source of information for future cohort studies of the epidemiology of infectious diseases in the catchment area of the only tertiary hospital in North Queensland.
  21 in total

1.  Cost-effectiveness of a post-splenectomy registry for prevention of sepsis in the asplenic.

Authors:  Ian Woolley; Penelope Jones; Denis Spelman; Lisa Gold
Journal:  Aust N Z J Public Health       Date:  2006-12       Impact factor: 2.939

2.  Growth of linked hospital data use in Australia: a systematic review.

Authors:  Michelle Tew; Kim M Dalziel; Dennis J Petrie; Philip M Clarke
Journal:  Aust Health Rev       Date:  2017-08       Impact factor: 1.990

3.  Long-term mortality and causes of death associated with Staphylococcus aureus bacteremia. A matched cohort study.

Authors:  N Gotland; M L Uhre; N Mejer; R Skov; A Petersen; A R Larsen; T Benfield
Journal:  J Infect       Date:  2016-07-12       Impact factor: 6.072

4.  A new method of classifying prognostic comorbidity in longitudinal studies: development and validation.

Authors:  M E Charlson; P Pompei; K L Ales; C R MacKenzie
Journal:  J Chronic Dis       Date:  1987

5.  Optimising the use of linked administrative data for infectious diseases research in Australia.

Authors:  Hannah C Moore; Christopher C Blyth
Journal:  Public Health Res Pract       Date:  2018-06-14

6.  Hospitalized community-acquired pneumonia in the elderly: an Australian case-cohort study.

Authors:  S A Skull; R M Andrews; G B Byrnes; D A Campbell; H A Kelly; G V Brown; T M Nolan
Journal:  Epidemiol Infect       Date:  2008-06-18       Impact factor: 2.451

7.  The increased risks of death and extra lengths of hospital and ICU stay from hospital-acquired bloodstream infections: a case-control study.

Authors:  Adrian G Barnett; Katie Page; Megan Campbell; Elizabeth Martin; Rebecca Rashleigh-Rolls; Kate Halton; David L Paterson; Lisa Hall; Nerina Jimmieson; Katherine White; Nicholas Graves
Journal:  BMJ Open       Date:  2013-10-31       Impact factor: 2.692

8.  Identifying incident colorectal and lung cancer cases in health service utilisation databases in Australia: a validation study.

Authors:  David Goldsbury; Marianne Weber; Sarsha Yap; Emily Banks; Dianne L O'Connell; Karen Canfell
Journal:  BMC Med Inform Decis Mak       Date:  2017-02-27       Impact factor: 2.796

9.  Long-term mortality of hospitalized pneumonia in the EPIC-Norfolk cohort.

Authors:  P K Myint; K R Hawkins; A B Clark; R N Luben; N J Wareham; K-T Khaw; A M Wilson
Journal:  Epidemiol Infect       Date:  2015-08-24       Impact factor: 2.451

10.  Record linkage study of the pathogen-specific burden of respiratory viruses in children.

Authors:  Faye J Lim; Christopher C Blyth; Parveen Fathima; Nicholas de Klerk; Hannah C Moore
Journal:  Influenza Other Respir Viruses       Date:  2017-10-30       Impact factor: 4.380

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.