Literature DB >> 30608539

Estimating disease prevalence from drug utilization data using the Random Forest algorithm.

Laurentius C J Slobbe1,2, Koen Füssenich1,3, Albert Wong1, Hendriek C Boshuizen1,4, Markus M J Nielen5, Johan J Polder1,2, Talitha L Feenstra1,3, Hans A M van Oers1,2.   

Abstract

BACKGROUND: Aggregated claims data on medication are often used as a proxy for the prevalence of diseases, especially chronic diseases. However, linkage between medication and diagnosis tend to be theory based and not very precise. Modelling disease probability at an individual level using individual level data may yield more accurate results.
METHODS: Individual probabilities of having a certain chronic disease were estimated using the Random Forest (RF) algorithm. A training set was created from a general practitioners database of 276 723 cases that included diagnosis and claims data on medication. Model performance for 29 chronic diseases was evaluated using Receiver-Operator Curves, by measuring the Area Under the Curve (AUC).
RESULTS: The diseases for which model performance was best were Parkinson's disease (AUC = .89, 95% CI = .77-1.00), diabetes (AUC = .87, 95% CI = .85-.90), osteoporosis (AUC = .87, 95% CI = .81-.92) and heart failure (AUC = .81, 95% CI = .74-.88). Five other diseases had an AUC >.75: asthma, chronic enteritis, COPD, epilepsy and HIV/AIDS. For 16 of 17 diseases tested, the medication categories used in theory-based algorithms were also identified by our method, however the RF models included a broader range of medications as important predictors.
CONCLUSION: Data on medication use can be a useful predictor when estimating the prevalence of several chronic diseases. To improve the estimates, for a broader range of chronic diseases, research should use better training data, include more details concerning dosages and duration of prescriptions, and add related predictors like hospitalizations.
© The Author(s) 2019. Published by Oxford University Press on behalf of the European Public Health Association.

Entities:  

Year:  2019        PMID: 30608539      PMCID: PMC6660107          DOI: 10.1093/eurpub/cky270

Source DB:  PubMed          Journal:  Eur J Public Health        ISSN: 1101-1262            Impact factor:   3.367


Introduction

Information on disease prevalence is important for assessing the health needs of populations. Several sources can deliver population disease prevalence estimates, such as surveys, dedicated epidemiologic studies using diagnostics or administrative data sources. Drug use data, especially on prescription drugs, has also frequently been used to estimate disease prevalence., In many countries insurers or providers maintain extensive prescription databases, allowing easy access to national drug use data. Drug use has several advantages over other sources. Surveys are costly to execute on a large scale. Hospital discharge registers are large, but involve hospital-related events only. In addition, in the Netherlands, the GP-serves as a gatekeeper, implying that patients—except in emergencies—can only visit a medical specialist with a referral of the GP. This means that the GP sees both patients that see only the GP and those he refers to specialist care. Hospital data is therefore more likely that GP data to underestimates the prevalence., GP-registers containing diagnosis codes are not readily available in all countries. Furthermore, GPs may have different coding habits, hindering comparisons between GPs. While drug use data is often recorded without a diagnosis, some studies base disease prevalence estimates on direct links of specific drug use to the presence of certain diseases.,,,, The links are based on literature or medical guidelines. For two reasons, this procedure is problematic. First, many drugs are used for the treatment of multiple diseases; assuming that all patients who take a specific drug do have a specific disease will then lead to overestimation. Second, some patients with a disease are not prescribed the specific drug, and this will lead to underestimation of prevalence. To overcome these two problems, it is better to estimate the probability of having a specific disease given all different medications a person uses. Avoiding any a priori assumption on the relationship between the drugs and diagnoses, machine learning algorithms can be used to estimate this relation from data. In this paper more specifically the Random Forest (RF) algorithm will be applied, as this method yielded the best results in comparison with others., This algorithm requires a test set with both diagnosis data and drug use. This diagnosis data could also be used directly to estimate disease prevalence. This is the case particularly when it is possible to assume that the set containing diagnosis data is representative for the population of interest. However, using the diagnosis data in combination with drug use as proposed alters the assumption. Rather than that the diagnosis data should be representative for the population of interest, the relationship between diagnosis and drug use should be similar as in the population of interest. This might be a more reasonable assumption in many cases, as medical professionals are influenced by standardized prescription guidelines. Countries which do have a prescription registration, but lack population surveys on disease prevalence, as is often the case, can use the relation derived in comparable countries to obtain prevalence estimates. Existing applications of RF analysis to the problem of disease prevalence estimates have some limitations. Chaudhry used RF to predict the population prevalence of diabetes and dementia from administrative data in GP and hospital records. However, his choice of predictors was informed by a priori knowledge. Khalilia et al. predicted the presence of eight diseases with RF from hospital in-patient data, but did not make any population prevalence estimates. In contrast, we apply the RF approach to a broad range of 29 diseases. The RF algorithm allows us to select important predictors from the full range of possible drug use predictors. Afterwards, we have a list of predictors for comparison with existing theory based lists of predictors, e.g. the Dutch Pharmacotherapeutic compass. The objective of this paper hence is to examine for which diseases the prevalence can be estimated using the RF algorithm, and if so, to see which drug groups should be used.

Methods

Random Forest

Estimating the probability that an individual has a certain disease could be considered a mathematical classification problem. RF is a non-parametric method to address classification problems. For implementation the R-package ‘Random Forest’ was used.

Data

Drug use data of the entire Dutch population is available from the National Health Care Institute (ZiN). The ZiN claims database covers all outpatient prescriptions reimbursed under the Dutch mandatory Health Insurance scheme. Drugs were classified in 204 pharmaceutical groups according to the four position ATC-code. To these groups, age and gender were added as predictors. The dataset contained 47 million individual prescription records in 2010, covering a population of 16.7 million, of which 70% had at least one prescription. A training set with disease information was obtained from the primary care database of the Netherlands Institute for Health Services Research (NIVEL). As every citizen is required to have a GP—with the exception of those living institutionalized—this means the dataset is likely to cover the whole Dutch population, with the exception of the 80+ population of which in 2010 a significant part lived institutionalized. All patient contacts were labelled with a diagnostic code, ICPC. A person was defined to have a disease when he/she had at least one contact with a GP for this disease over a period of 3 years. All GP-patients with full data available over 2008–2010 were selected. This resulted in a training set of 276 723 individuals. The selection of 29 diseases was based on a list provided by O’Halloran et al. See Supplementary file S1 for details. We combined the available data (drug utilization, age and gender, and ICPC codes) at Statistics Netherlands within the System of Social Statistical Datasets (SSD). The SSD allows data from different administrative registers to be combined using an anonymous patient identifier for research purposes.,

Implementation of RF

Usually, all observations in a training set and all predictors are combined in one RF-analysis. However, within the SSD system, computing power is limited, and analysis with our dataset (276 723 records with 206 variables) proved to be difficult. We therefore used a two-step approach. First, for each chronic disease, persons with the disease were randomly selected, up to a maximum of 5000 patients. To this set, an equal number of persons without the disease was randomly selected and added. For each of these smaller sets, the RF algorithm was applied. The variable importance measure, defined as the average decrease in accuracy when a predictor is left out of the analysis, was evaluated. For each disease, the 10 drug groups with the highest variable importance were selected. By selecting 10 drug groups, the most important predictors were included for all diseases, while limiting the computing times. Second, a new dataset was created for each disease based on the full training set, but only age, gender and the drug groups selected in the first step were added as predictors (276 723 records with 12 variables for each disease), and we applied RF a second time. For each disease this second RF-model was then applied to obtain the probability of having this disease for each individual in the prescriptions database, hence for the 11.6 million Dutch inhabitants that were reimbursed a prescription drug in 2010. The model was also applied to the remaining 5. million Dutch individuals without any prescription. They received for each of the 29 diseases a probability equivalent to the age and gender specific probability in the training set for those diagnosed with the disease, but not receiving any prescription.

Outcome measures

For each disease, the most important drugs according to the variable importance were compared with theoretical drug classifications included in relevant guidelines. For 13 of the 29 chronic diseases pharmaceutical groups used in the Dutch insurance system were available. In addition, for four other diseases the drugs found were compared with Dutch treatment guidelines: tuberculosis, MS, chronic back or neck disorder, and gastric or duodenal ulcer. To measure the performance of the final RF-models, the area under the Receiver-Operator Curve (AUC) was measured for the training set for each disease separately. An AUC-value above .7 is generally considered useful. To prevent overfitting, 10-fold cross validation was applied. The AUC and a 95% confidence interval around the AUC-value were obtained using the R-package ‘cvAUC’. If the lower boundary of this interval was above .5, we considered the model to perform better than a random prediction. The predicted population prevalence by age and gender for the Netherlands was graphically compared with a prevalence estimate based on direct extrapolation of the training set prevalence. Correlations were computed as well for the six diseases with lower confidence bound (95%) of the AUC >.70. The age range considered was 30–80 years, since the prevalence below 30 is very low for most chronic diseases and the 80+ population was not well covered in our training set. For a binary classification of each individual, a cut-off needs to be chosen. This was done by setting an age and disease-specific cut-off value. All persons with a probability higher than the cut-off were classified as ‘ill’. The cut-off was chosen to minimize the deviation between the observed and the predicted prevalence in the training set for each age, gender and disorder.

Results

Table 1 gives descriptives for the training set. The average annual number of different pharmaceutical drugs taken by patients in the training set was 2.9, which is very comparable with the utilization in the total Dutch population in the same year (2.8). Table 1 also shows that the number of ATC groups utilized by an individual patient rises proportionally with the number of chronic diseases present.
Table 1

Pharmaceutical utilization in dataset

Persons with at least one recorded episode for each chronic disease in 2008–2010Total number of pharmaceutical groups utilized in 2010Average number of pharmaceutical groups utilized per person
Persons without disease184 826328 385 1.8
Persons with 1 chronic disease60 065235 032 3.9
Persons with 2 chronic diseases20 090125 335 6.2
Persons with 3 chronic diseases760963 064 8.2
Persons with 4 chronic diseases269727 190 9.9
Persons with 5 or more chronic diseases143617 219 11.6
Total276 723796 225 2.9
Percentage with at least one chronic disease:33.2%
Percentage study population with multiple diseases:11.5%

Legend: Training set population has been divided into six strata, based on the number of chronic diseases present. First column presents stratum. Second column gives population size. Third column gives total number of pharmaceutical groups utilized. Pharmaceutical groups have been defined in terms of an ATC 4 position code: A01A, A02A, etc. Last column gives average utilization in stratum.

Pharmaceutical utilization in dataset Legend: Training set population has been divided into six strata, based on the number of chronic diseases present. First column presents stratum. Second column gives population size. Third column gives total number of pharmaceutical groups utilized. Pharmaceutical groups have been defined in terms of an ATC 4 position code: A01A, A02A, etc. Last column gives average utilization in stratum. Table 2 lists the AUC values produced by our analysis, sorted by average AUC. For 17 diseases the lower boundary of the 95% AUC confidence interval was >.5. For 10 diseases the average AUC was .7 or higher, but for only six the lower boundary of the AUC 95% confidence interval was >=.7: Parkinson’s disease, diabetes mellitus, osteoporosis, heart failure, asthma and chronic obstructive pulmonary disease (COPD).
Table 2

Model outcome AUC with confidence interval, ordered by mean AUC

DiseaseAUC (95% conf. interval)Prevalence in training set per 10 000 persons
Parkinson’s disease .89 (.77–1.00) 15
Diabetes mellitus .87 (.85–.90) 421
Osteoporosis .87 (.81–.92) 103
Heart failure .81 (.74–.88) 82
Chronic obstructive pulmonary disease .79 (.75–.83) 209
Chronic enteritis/colitis ulcerosa .79 (.68–.90) 31
HIV/AIDS .78 (.39–1.00) 4
Asthma .77 (.74–.80) 424
Epilepsy .77 (.66–.87) 41
Coronary heart disease .70 (.66–.74) 255
Visual disorder .69 (.64–.73) 191
Schizophrenia .69 (.48–.89) 10
Rheumatoid arthritis .68 (.60–.76) 66
Dementia .67 (.54–.80) 28
Congenital neurological anomaly .67 (.01–1.00) 3
Multiple sclerosis .66 (.42–.90) 9
Cancer .60 (.56–.64) 264
Chronic alcohol abuse .59 (.49–.69) 45
Depressive disorder .58 (.54–.63) 253
Stroke (including TIA) .57 (.52–.63) 137
Congenital cardiovascular anomaly .57 (.37–.77) 7
Chronic back or neck disorder .56 (.53–.60) 432
Osteoarthritis .56 (.52–.60) 282
Anxiety disorder, neurosis, PTSS .56 (.50–.61) 154
Mental retardation .55 (.36–.74) 13
Hearing disorder .52 (.44–.61) 62
Anorexia .52 (.33–.71) 8
Gastric or duodenal ulcer .50 (.39–.62) 25
Tuberculosis .50 (.06–.94) 2

Legend: First column gives name of chronic disease. Second column gives model outcome of RF-analysis as AUC with 95% confidence interval, in order of decreasing AUC. Third column states prevalence of chronic disease or condition in the training set. (n = 276 723).

Model outcome AUC with confidence interval, ordered by mean AUC Legend: First column gives name of chronic disease. Second column gives model outcome of RF-analysis as AUC with 95% confidence interval, in order of decreasing AUC. Third column states prevalence of chronic disease or condition in the training set. (n = 276 723). There is some association between the frequency of the disease and the prediction of the AUC. For almost all 12 diseases with a prevalence in the training set higher than 100 per 10 000 persons, the prediction is better than a random assignment. The only exception is anxiety disorder (154 cases per 10 000 persons), with a very poor performance and AUC of .56 (95% cf. = .50–.61). For 11 out of 17 diseases with a frequency below 100 per 10 000 persons, performance is poor, i.e. the lower boundary of the AUC 95% confidence interval was below .5. A notable exception was Parkinson’s disease which despite a low frequency (15 per 10 000 persons) seems to be very predictable from drugs utilization. In table 3 predictors of all model output from the RF-analysis are ranked by importance. The shaded areas denote drugs that are also mentioned as indicator drugs for these diseases in the theoretical drug classifications we compared with. Only for cancer we found no similarities. For all other diseases, the ATC codes mentioned by insurers and guidelines are also strong predictors for the corresponding diseases in our RF-models. However, our models show a number of additional predictors for most disorders. Supplementary file S2 gives more information.
Table 3

Predictors of chronic diseases in Random Forest analysis

DiseaseATC4 groups with strongest relation with disease in RF-analysis
[1]= strongest relation, [10]= weakest relation
[1][2][3][4][5][6][7][8][9][10]
Parkinson's diseaseaN04BN04ABirthyearL04AA07EC10AC09BN05AC03CN06D
Diabetes mellitusaA10AA10BC10AB01AH04AC09AC09BC08CC03CC07A
OsteoporosisM05BA12AA11CL04AC10AB01AC09DD01AC09CD06A
Heart failureC03CC03DA12BC01AC08DC08CC01DC03AC10AC07B
Chronic obstructive pulmonary diseaseaR03AR03BR06AA01AR01AN06BJ01FR05DA06AJ01C
Chronic enteritis/colitis ulcerosaA07EL04AL01BB03BA06AJ01MA11CA07DN02AM01A
HIV/AIDSaJ05AJ01EJ04AJ01FN01BJ07BD06BA02BJ01CJ01A
AsthmaaR03AR03BR03CBirthyearH02AA07AS01GR01AR06AR05D
EpilepsyaN03AN05BBirthyearA03FN05AB01AN06AD04AD11AN05C
Coronary heart diseaseaC01DC08DC03CC01BC03AC09AD06AC01EC09CB03A
Visual disorderS01ES01BBirthyearS01CA10BS01FD02AS01AA10AS01X
SchizophreniaaN05AN05BN05CN04AN06AN06BBirthyearN03AA06AN07C
Rheumatoid arthritisaL04AP01BA07EB03BN02AL01BH02AD02BM05BD06A
DementiaaN06DBirthyearN05AA12AC03CN03AM05BYD03DC09D
Congenital neurological anomalyM03BG04BN03AJ01XD07XN05AJ01EA12AD01AN05B
Multiple sclerosisbL03AM03BG04BN03AN06AN04BB03BS01AJ01XC03C
CanceraBirthyearL02BH03AYD06AA04AGenderL02AG03CA12A
Chronic alcohol abuseaN07BN05ABirthyearN05BG04CA02BN06AM04AA10BN05C
Depressive disorderaN06AN05AN05BN06BN05CN07BN03AA03FA11CG04B
Stroke (including TIA)B01AV03AC01DC01BC07ABirthyearC08CC01AS01CS01E
Congenital cardiovascular anomalyB01AC07AJ01CN03AC09AD06AR03BYD02AS02C
Chronic back or neck disorderbN02AM01AA02BA06AN02BC05AS02CN03AH02AR05D
OsteoarthritisBirthyearB01AGenderM04AC10AN02BC10BC03AS01EN02A
Anxiety disorder, neurosis, PTSSN06AN05BN05AC07AN01BN05CA03FA06AN03AD05A
Mental retardationN05AN03AD02AD06AA06AN05BYD10AS01FN01B
Hearing disorderBirthyearL02AB02AS01XD05AG04CH02AA10BC08DC07B
AnorexiaGenderG03AA06AA01AYJ01XG03HR05DG01AA12B
Gastric or duodenal ulcerbA02BD05AG04BA07AA03AA03FM01AA11CD06BA06A
TuberculosisbJ04AD02AD07XC01BJ01AC08CC03CS01CS02CC03E

Legend: First column gives name of chronic disease. The next 10 columns list the predictors used in the final RF-model, in order of decreasing importance. To facilitate comparison, diseases are presented in the same order as in table 2.

For these diseases, comparison is possible with pharmaceutical groups listed in risk adjustment compulsory insurance. The shaded groups are also used for the detection of these diseases by Dutch insurers.

These diseases have been compared with ATC-groups mentioned in the relevant Dutch treatment guidelines. The shaded groups are included in these guidelines.

Predictors of chronic diseases in Random Forest analysis Legend: First column gives name of chronic disease. The next 10 columns list the predictors used in the final RF-model, in order of decreasing importance. To facilitate comparison, diseases are presented in the same order as in table 2. For these diseases, comparison is possible with pharmaceutical groups listed in risk adjustment compulsory insurance. The shaded groups are also used for the detection of these diseases by Dutch insurers. These diseases have been compared with ATC-groups mentioned in the relevant Dutch treatment guidelines. The shaded groups are included in these guidelines. The actual prevalence in the training set and the calculated prevalence based on applying the final RF-models have been compared for the six diseases with a lower bound of the AUC 95% confidence interval >.7. Except Asthma, correlations are above .9. Asthma shows correlations of .43 for males and .66 for females, indicating poor performance. Looking at the graphs for osteoporosis, a large discrepancy exists between predicted and observed prevalence around the age of 70. Figure 1 gives an example (COPD, male). A full set of figures is found in Supplementary file S3.
Figure 1

Example of comparison between Dutch population prevalence for ages 30–80 estimated from model applied to drug utilization data and estimation based on training set for COPD, male

Example of comparison between Dutch population prevalence for ages 30–80 estimated from model applied to drug utilization data and estimation based on training set for COPD, male

Discussion

For a broad range of 29 diseases, RF was used to predict disease prevalence based on medication use. Predictive performance was acceptable for 6 out of 29 diseases and would result in reliable estimates of population prevalence. Furthermore, we find that theory-based indicator drugs were included in the range of diseases identified by the RF model. This seems to be independent from the performance of the models, which indicates that the RF algorithm can also be used to identify suitable predictors, even in those cases were the predictive performance is low. Especially for diabetes, heart failure and COPD we observe a high correlation between estimated and observed population. Our outcomes can be compared with a few other studies. Chaudhry predicted the presence of diabetes with an AUC of .95 and dementia with an AUC of .875, higher than the .87 and .67 we found. However, for dementia he used dementia-coded doctor visits as predictors, while we use this as our definition of disease. Khalilia et al. used data on hospital stays as predictors, on a very large set (8 million records). A training set was generated by bootstrapping. The average AUC he reports (.88) is much higher than those we found. For the two diseases which could be directly compared (diabetes and osteoporosis) he finds almost the same AUC (.879 and .870 respectively) as we found, .87 for both. Compared with these two previous studies, we included a relatively broad range of diseases and added the comparison with theory-based models. While the method seems useful for some diseases, the predictive performance is still low for most diseases. This could have multiple causes. First, for some diseases, there is no standard pattern of drugs included in all treatment options. In addition, drugs might be prescribed for multiple diseases. For instance, the two strongest predictors for asthma and COPD are the same (R03A and R03B, table 3). As a result, misclassification of asthma and COPD patients is likely to occur, which has not been further investigated in this study. Furthermore, patients and GPs may deal with diseases in different ways. Based on patient characteristics a GP will sometimes advise lifestyle changes instead of drugs, but will treat similar cases in other instances immediately with drugs. In addition, the patient may have treatment preferences. The relationship between diagnosis and drugs can also change over time. Innovation or policy changes can strongly influence prescription behaviour, making regular calibration of the algorithms necessary. Second, the predictive power is likely limited due to weaknesses of the current data. In the current training set, only 3 years of diagnoses are used. While many patients with a chronic disease are visiting a GP more than once every 3 years, some patients who visit less frequently will not occur as diseased in the training set. Furthermore, some diseases might not be treated primarily by a GP, but directly in the hospital, also resulting in missing diagnoses in the training set. As the training set serves as a ‘golden standard’, any diagnosis errors in the training set will translate into the final predictions. Investing in a smaller set of persons for which disease diagnosis is even more reliable, e.g. through the use of cohort studies may provide a training set with better performance. The disadvantage of such a cohort, and the advantage of our current approach is that for rare diseases, relationships between disease and drugs would have to be derived from only a very limited number of disease cases. Next to errors in disease diagnosis, drugs use measures could also be improved. Drug use often varies between years. Grouping multiple years of drug use could improve results. Also, more complete drug utilization data could be obtained by including inpatient drugs. For some diseases, utilizing more detailed pharmaceutical predictors, such as ATC4 or ATC5 groups, would improve results. Even though improvements might be needed to obtain reliable prevalence estimates for most diseases, for 16 out of 17 diseases for which theory-based predictors were found within existing guidelines, important similarities were found. This means that even though the predictive power of the algorithm on the current data is insufficient, it is still possible to identify relevant drug groups. Compared with purely theory-based models, the RF algorithms have the important advantage of coming with confidence intervals and information about model performance. From this similarity we also infer that Dutch general practioners broadly follow existing pharmaceutical guidelines. Cancer was the only disease for which the drugs found using the RF algorithm differ from theory. This could be the result of grouping all cancers together, and many drugs used in cancer treatment were not covered by the dataset as they were prescribed in a hospital setting. We do not want to suggest that prediction models can entirely replace current GP registers or population surveys. On the contrary, since without these registries the models cannot be built or validated. However, even in countries like the Netherlands which are covered by both population surveys and GP networks the method is of practical value, as it allows for analysis on subgroups, such as regions or stratifications by socio-economic status. The primary care database used as training set has been enlarged in recent years, but still covers at this moment only 10% of the population and the Dutch GPs. Using drug use will allow for better prevalence estimates for the 90% not covered. Because the full population is covered in the prescription data we use, and the model provides estimates of the probability of having a disease at the individual level, other useful applications would be pre-selecting subjects for medical trials, or making case-mix corrections, e.g. for comparing hospital performance. To conclude, combining diagnosis data and drug use by the RF algorithm provides can be a useful tool to predict population prevalence. Applications include situations where the diagnosis data is not necessarily representative for the population of interest, but the relation found between diagnosis and drug use is representative. Furthermore, it can be used to select relevant drug use groups in almost all cases. Click here for additional data file.
  22 in total

Review 1.  Defining chronic conditions for primary care with ICPC-2.

Authors:  Julie O'Halloran; Graeme C Miller; Helena Britt
Journal:  Fam Pract       Date:  2004-08       Impact factor: 2.267

2.  The prevalence of selected physical activities and their relation with coronary heart disease risk factors in elderly men: the Zutphen Study, 1985.

Authors:  C J Caspersen; B P Bloemberg; W H Saris; R K Merritt; D Kromhout
Journal:  Am J Epidemiol       Date:  1991-06-01       Impact factor: 4.897

3.  The cost burden of diabetes mellitus: the evidence from Germany--the CoDiM study.

Authors:  I Köster; L von Ferber; P Ihle; I Schubert; H Hauner
Journal:  Diabetologia       Date:  2006-05-11       Impact factor: 10.122

4.  A chronic disease score from automated pharmacy data.

Authors:  M Von Korff; E H Wagner; K Saunders
Journal:  J Clin Epidemiol       Date:  1992-02       Impact factor: 6.437

5.  National prevalence of gout derived from administrative health data in Aotearoa New Zealand.

Authors:  Doone Winnard; Craig Wright; William J Taylor; Gary Jackson; Leanne Te Karu; Peter J Gow; Bruce Arroll; Simon Thornley; Barry Gribben; Nicola Dalbeth
Journal:  Rheumatology (Oxford)       Date:  2012-01-16       Impact factor: 7.580

6.  Estimating disease prevalence using a population-based administrative healthcare database.

Authors:  Ann-Britt E Wiréhn; H Mikael Karlsson; John M Carstensen
Journal:  Scand J Public Health       Date:  2007       Impact factor: 3.021

7.  Hospital discharge records under-report the prevalence of diabetes in inpatients.

Authors:  Florentino Carral; Gabriel Olveira; Manuel Aguilar; José Ortego; Inmaculada Gavilán; Inmaculada Doménech; Luis Escobar
Journal:  Diabetes Res Clin Pract       Date:  2003-02       Impact factor: 5.602

8.  An algorithm to identify patients with treated type 2 diabetes using medico-administrative data.

Authors:  Laurence M Renard; Valery Bocquet; Gwenaelle Vidal-Trecan; Marie-Lise Lair; Sophie Couffignal; Claudine Blum-Boisgard
Journal:  BMC Med Inform Decis Mak       Date:  2011-04-14       Impact factor: 2.796

9.  Can we use the pharmacy data to estimate the prevalence of chronic conditions? a comparison of multiple data sources.

Authors:  Francesco Chini; Patrizio Pezzotti; Letizia Orzella; Piero Borgia; Gabriella Guasticchi
Journal:  BMC Public Health       Date:  2011-09-05       Impact factor: 3.295

10.  Predicting disease risks from highly imbalanced data using random forest.

Authors:  Mohammed Khalilia; Sounak Chakraborty; Mihail Popescu
Journal:  BMC Med Inform Decis Mak       Date:  2011-07-29       Impact factor: 2.796

View more
  6 in total

1.  Comparing health insurance data and health interview survey data for ascertaining chronic disease prevalence in Belgium.

Authors:  Finaba Berete; Stefaan Demarest; Rana Charafeddine; Olivier Bruyère; Johan Van der Heyden
Journal:  Arch Public Health       Date:  2020-11-17

2.  Chronic prescription of antidepressant medication in patients with chronic kidney disease with and without kidney replacement therapy compared with matched controls in the Dutch general population.

Authors:  Manon J M van Oosten; Dan Koning; Susan J J Logtenberg; Martijn J H Leegte; Henk J G Bilo; Marc H Hemmelder; Kitty J Jager; Vianda S Stel
Journal:  Clin Kidney J       Date:  2021-12-03

3.  Neuropsychiatric Comorbidity in Primary Hyperparathyroidism Before and After Parathyroidectomy: A Population Study.

Authors:  A Koman; R Bränström; Y Pernow; R Bränström; I-L Nilsson; Fredrik Granath
Journal:  World J Surg       Date:  2022-03-05       Impact factor: 3.282

4.  Estimating the prevalence and incidence of treated type 2 diabetes using prescription data as a proxy: A stepwise approach on Iranian data.

Authors:  Alireza Mirahmadizadeh; Sayed Aliakbar Banihashemi; Mehdi Hashemi; Sanaz Amiri; Suzan Basir; Alireza Heiran; Omid Keshavarzian
Journal:  Heliyon       Date:  2021-06-09

5.  Mapping chronic disease prevalence based on medication use and socio-demographic variables: an application of LASSO on administrative data sources in healthcare in the Netherlands.

Authors:  Koen Füssenich; Hendriek C Boshuizen; Markus M J Nielen; Erik Buskens; Talitha L Feenstra
Journal:  BMC Public Health       Date:  2021-06-02       Impact factor: 3.295

6.  Association between health literacy, general psychological factors, and adherence to medical treatment among Danes aged 50-80 years.

Authors:  Subash Thapa; Jesper B Nielsen
Journal:  BMC Geriatr       Date:  2021-06-26       Impact factor: 3.921

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.