Literature DB >> 30143045

Development of predictive risk models for major adverse cardiovascular events among patients with type 2 diabetes mellitus using health insurance claims data.

James B Young¹, Marjolaine Gauthier-Loiselle², Robert A Bailey³, Ameur M Manceur⁴, Patrick Lefebvre⁴, Morris Greenberg⁵, Marie-Hélène Lafeuille⁴, Mei Sheng Duh⁵, Brahim Bookhart³, Carol H Wysham⁶.

Abstract

BACKGROUND: There exist several predictive risk models for cardiovascular disease (CVD), including some developed specifically for patients with type 2 diabetes mellitus (T2DM). However, the models developed for a diabetic population are based on information derived from medical records or laboratory results, which are not typically available to entities like payers or quality of care organizations. The objective of this study is to develop and validate models predicting the risk of cardiovascular events in patients with T2DM based on medical insurance claims data.
METHODS: Patients with T2DM aged 50 years or older were identified from the Optum™ Integrated Real World Evidence Electronic Health Records and Claims de-identified database (10/01/2006-09/30/2016). Risk factors were assessed over a 12-month baseline period and cardiovascular events were monitored from the end of the baseline period until end of data availability, continuous enrollment, or death. Risk models were developed using logistic regressions separately for patients with and without prior CVD, and for each outcome: (1) major adverse cardiovascular events (MACE; i.e., non-fatal myocardial infarction, non-fatal stroke, CVD-related death); (2) any MACE, hospitalization for unstable angina, or hospitalization for congestive heart failure; (3) CVD-related death. Models were developed and validated on 70% and 30% of the sample, respectively. Model performance was assessed using C-statistics.
RESULTS: A total of 181,619 patients were identified, including 136,544 (75.2%) without prior CVD and 45,075 (24.8%) with a history of CVD. Age, diabetes-related hospitalizations, prior CVD diagnoses and chronic pulmonary disease were the most important predictors across all models. C-statistics ranged from 0.70 to 0.81, indicating that the models performed well. The additional inclusion of risk factors derived from pharmacy claims (e.g., use of antihypertensive, and use of antihyperglycemic) or from medical records and laboratory measures (e.g., hemoglobin A1c, urine albumin to creatinine ratio) only marginally improved the performance of the models.
CONCLUSION: The claims-based models developed could reliably predict the risk of cardiovascular events in T2DM patients, without requiring pharmacy claims or laboratory measures. These models could be relevant for providers and payers and help implement approaches to prevent cardiovascular events in high-risk diabetic patients.

Entities: Chemical Disease Gene Species

Keywords: Cardiovascular disease; Health insurance claims; Risk prediction; Type 2 diabetes mellitus

Mesh：

Year: 2018 PMID： 30143045 PMCID： PMC6109303 DOI： 10.1186/s12933-018-0759-z

Source DB: PubMed Journal: Cardiovasc Diabetol ISSN： 1475-2840 Impact factor: 9.951

Background

Type 2 diabetes may cause complications of microvascular origin, including nephropathy, neuropathy, and retinopathy, or macrovascular origin, including peripheral artery disease and cardiovascular disease (CVD) [1, 2]. Although diabetes clinical practice guidelines are intended to reflect consensus and evidence-based best medical practices, different entities have some conflicting recommendations, and providing high-quality and detailed guidelines for specific patient subgroups remains challenging [3]. For example, relative to non-diabetic patients, patients with type 2 diabetes have a two- to threefold higher risk of suffering from a CVD event, including a higher risk of myocardial infarction (MI), stroke, unstable angina, and congestive heart failure [4-7], and a higher rate of CVD-related death [8]. Therefore, certain patients with type 2 diabetes could benefit from specialized care that both improve glycemic control and mitigate the risk of CVD. Thus, having reliable tools making use of readily available data to predict the risk of cardiovascular events among patients with type 2 diabetes may allow healthcare resources to be directed towards patients at high risk, and help healthcare providers meet new quality standard of care. In fact, in 2016, the National Committee for Quality Assurance (NCQA) implemented a new Healthcare Effectiveness Data and Information Set (HEDIS) performance measure based on the rates of hospitalization for potentially preventable complications [9]. More specifically, this measure, which is used by over 90% of health plans in the US [9], targets, among other complications, diabetes short- and long-term complications, including CVD events leading to hospitalization [10]. This means that higher rates of adverse cardiovascular events among patients with type 2 diabetes may negatively affect the NCQA ratings of healthcare providers. Moreover, given the high costs incurred by patients with both CVD and diabetes [11], using such tool efficiently may translate into significant cost savings. Several of the predictive CVD risk models that have been developed for the general population include diabetes as a risk factor, with models derived from the Framingham Heart Study being among the most well-known [12-14]. Scores based on the Framingham risk models assign weights to risk factors in order to predict cardiovascular events separately for men and women. Risk factors identified for CVD include older age, smoking status, treated and untreated systolic blood pressure, total cholesterol and high-density-lipoprotein cholesterol levels, and diabetes [12-14]. However, the Framingham risk models were not developed for patients with diabetes, and were shown to systematically underestimate CVD risk in this population [15]. In fact, the characteristics of patients enrolled in the Framingham study may differ from real-world populations with diabetes in several ways, including the proportion of minorities, socioeconomic determinants of health, and comorbidity burden [16]. Thus, other risk models have been developed for this population, but all of them rely on data from medical records [17-23]. For example, risk models derived from the United Kingdom Prospective Diabetes Study (UKPDS) identified several risk factors that cannot be used as quantitative predictors using health insurance claims, such as duration of type 2 diabetes, glycated hemoglobin (HbA1c) levels, systolic blood pressure, and cholesterol/high-density lipoprotein ratio [21, 23]. Similarly, the ADVANCE study identified age at diabetes diagnosis, known duration of diabetes, pulse pressure, treated hypertension, HbA1c, urinary albumin/creatinine ratio, and non-HDL cholesterol among risk factors for CVD events; these risk factors cannot be assessed using health insurance claims [22]. Consequently, these models cannot be used to predict CVD risk by entities, like payers, that do not have access to information derived from medical records or laboratory results. As the face of healthcare provision changes and population management evolves, entities such as public and private payers are moving toward a capitated system of reimbursement, with payments made based on value rather than volume of care. It is thus important for both payers and providers to be able to assess the risks in a given population. Therefore, a CVD risk assessment tool based solely on accessible medical data such as health insurance claims would be relevant for payers to help identify patients with type 2 diabetes at high risk of CVD events. In fact, rationally allocating resources towards these patients by, for example, including CVD risk models in a tool made available to healthcare providers may result in reduced morbidity, mortality, and cost savings. Thus, this study aimed to develop new predictive models and assess their performance in predicting the risk of cardiovascular events in patients with type 2 diabetes based solely on information available in medical health insurance claims. More specifically, models were developed for patients without prior CVD events (hereinafter referred to as the primary prevention population) and for patients with prior CVD events (hereinafter referred to as the secondary prevention population).

Methods

Study design

A retrospective observational study design was used to model the risk of CVD events in patients with type 2 diabetes (Additional file 1). The index date was defined as a randomly selected date among those with a diagnosis of type 2 diabetes (International Classification of Diseases, 9th Revision, Clinical Modification [ICD-9-CM]: 250.x0 and 250.x2, International Classification of Diseases, 10th Revision, Clinical Modification [ICD-10-CM]: E11.xxx) followed by ≥ 13 months of continuous healthcare plan enrollment. The random selection enabled us to capture a representative sample of patients from a real-world setting with various disease duration. Risk factors for cardiovascular events were assessed during the baseline period, defined as the first 12 months following the index date. Cardiovascular events were monitored during the subsequent at-risk period, which was required to last ≥ 1 month and spanned from the end of the baseline period until the earliest among (i) end of data availability, (ii) end of continuous healthcare plan enrollment, or (iii) death. For each study outcome, the at-risk period was censored at the first occurrence of a given study outcome (see study outcomes section for more details).

Data source

The Optum™ Integrated Real-World Evidence Electronic Health Records and Claims database (Optum database), which combines de-identified electronic medical records and insurance claims, was used to develop and validate the risk models (October 1, 2006–September 30, 2016). This database comprises information on demographics, medical history, and diagnoses for all types of medical encounters (i.e., intensive care unit, emergency department [ED], ward, etc.), in-hospital procedures and medication administrations, prescriptions, laboratory results, and date of death. The database is de-identified and fully compliant with the patient confidentiality requirements of the Health Insurance Portability and Accountability Act (HIPAA).

Study population

Patients ≥ 50 years with ≥ 1 recorded diagnosis for type 2 diabetes (i.e., ICD-9-CM: 250.x0, and 250.x2; ICD-10-CM: E11.xxx) were included in the study. Patients were required to have ≥ 13 months of continuous eligibility in their healthcare plan after the index date. Patients were excluded if they had ≥ 1 recorded diagnosis for type 1 or gestational diabetes mellitus (i.e., ICD-9-CM: 250.x1, 250.x3, and 648.8x; ICD-10-CM: E10.xxx, O24.4xx, and O99.81x). Moreover, given the growing evidence suggesting that these medications may mitigate cardiovascular risk, to avoid potential confounding, patients were further excluded if they had ≥ 1 prescription fill for a sodium glucose co-transporter 2 (SGLT2) inhibitor or a glucagon-like peptide-1 (GLP-1) receptor agonist at any time during the study period [24-27]. The study population was further stratified into the primary and secondary prevention populations based on whether patients had ≥ 1 diagnosis for any cardiovascular events of interest (see below) in any setting (i.e., inpatient [IP], ED, or outpatient) prior to the at-risk period.

Study outcomes

Study outcomes included (1) any major adverse cardiovascular event (MACE), which comprised non-fatal MI, non-fatal stroke, and CVD-related death (defined below), (2) any MACE, hospitalization for unstable angina, or hospitalization for congestive heart failure; hereinafter referred to as MACE-plus, and (3) CVD-related death, defined as a death occurring within 30 days after a diagnosis for MI, stroke, unstable angina, heart failure, sudden cardiac arrest, cardiogenic shock, other cerebrovascular events, or other cardiovascular events recorded in a medical claim in any setting (Additional file 2 for ICD codes). Of note, because it was not possible to determine whether diagnoses for MI or stroke recorded in outpatient settings were actual cardiovascular events or follow-up visits for which the diagnosis was recorded for billing purposes, only diagnoses recorded in an ED or IP settings were considered in the risk models; diagnoses could be recorded in any position.

Statistical analyses

Distinct predictive risk models were developed for the primary and secondary prevention populations for each of the three study outcomes. A split sample approach was used: The primary and secondary prevention populations were each randomly split into a training (70% of the sample) and a validation (30% of the sample) set. The training sets were used to develop the predictive models, and the validation sets were used to assess the predictive accuracy of the models. For the prediction of study outcomes, potential risk factors were derived from the published literature and included age, gender, race, ethnicity, year, region, insurance type, prior cardiovascular events, time since first observed type 2 diabetes diagnosis, number of diabetes-related medical visits, Charlson comorbidity index (CCI) [28], adapted diabetes complications severity index (aDCSI) [29], and recorded diagnosis for selected comorbidities such as hypertension, hyperlipidemia, infections, mental disorders, chronic pulmonary disease, and obesity. Univariate associations between potential risk factors and outcomes were assessed; in order to develop more parsimonious models, risk factors were excluded if the standardized difference between patients with and without a given outcome was below 0.10, or if they were present in less than 0.5% of the sample. Pooled logistic regression models were developed to relate each candidate risk factor to outcomes at pre-specified time points during the at-risk period. A logistic regression model was selected because it can estimate the probability of an event occurring in an interval of time [30]. More specifically, for each patient, the at-risk period was stratified into windows of 6 months during which the outcomes were assessed. For example, the follow-up of a patient who had MACE 15 months after the beginning of the at-risk period was censored at the occurrence of this outcome and stratified in three windows in the regression model: (1) 0–6 months without MACE, (2) 6–12 months without MACE, and (3) 12–18 months with a MACE. For all windows, risk factors were evaluated at baseline, and indicator variables for each time interval were included in the regression models. The risk factors included in the final risk models were chosen using a stepwise variable selection approach based on Akaike’s Information Criterion, in conjunction with tenfold cross-validation methods within the training set. Further specifications of risk factors were tested and variance inflation factor analysis was used to assess the presence of multicollinearity between risk factors, which resulted in the final models. The performance of the final risk models was evaluated based on discrimination (i.e., C-statistics) in the training and validation sets [31]. The C-statistic is a measure of the predictive accuracy of a logistic regression, which varies between 0.5 (random discrimination) and 1.0 (perfect discrimination). It corresponds to the area under the receiver operating characteristic (ROC) curve [32]. In order to provide a more comprehensive view of the performance of models based on information derived from medical claims, other models that included risk factors derived from medical claims, pharmacy claims, and medical records and laboratory results were developed.

Results

A total of 181,619 patients with type 2 diabetes were included in the study; 136,544 (75.2%) in the primary prevention population and 45,075 (24.8%) in the secondary prevention population (Fig. 1). Among patients in the training set and in the primary prevention population, the proportions of patients with MACE, MACE-plus, and CVD-related death during the at-risk period were 4.7%, 6.5%, and 1.8%, respectively (Additional file 3). In the secondary prevention population, the same proportions were 16.5%, 24.9%, and 8.2%, respectively (Additional file 3). The median duration of the at-risk period following the index date in the training set of the primary prevention population was 12 months (range 1–109 months), with 5.4% of patients having a follow-up longer than 60 months. The median duration of the at-risk period in the training set of the secondary prevention population was 11 months (range 1–108 months), with 3.9% of patients having a follow-up longer than 60 months.

Fig. 1

Sample selection

Sample selection Patients with a CVD event during the at-risk period were older and had higher aDCSI scores compared to patients without CVD events for both the primary and the secondary prevention populations (primary prevention population: mean age = 72.7 vs. 66.4 years, mean aDCSI = 1.9 vs. 1.1, respectively; secondary prevention population: mean age = 75.0 vs. 71.4 years, mean aDCSI = 4.1 vs. 3.2, respectively; Additional file 3). Most patients (> 75%) had a recorded diagnosis for hypertension and/or hyperlipidemia in both the primary and secondary prevention populations. Moreover, compared to patients without CVD events, patients with a CVD event during the at-risk period were more likely to have a recorded diagnosis for select baseline comorbidities—such as infections (primary prevention population: 53.8% vs. 48.8%; secondary prevention population: 69.1% vs. 61.5%, respectively) chronic pulmonary disease (primary prevention population: 22.7% vs. 15.6%; secondary prevention population: 44.5% vs. 31.4%, respectively), and peripheral vascular disorders (primary prevention population: 19.0% vs. 9.3%; secondary prevention population: 34.3% vs. 26.1%, respectively) (Additional file 3).

Risk models

For the primary prevention population, a total of 12–17 risk factors were included in the models, and most of them were significantly associated with the study outcomes (Table 1). Across all study outcomes, age was the risk factor with the largest impact on the risk of having an event (Table 1). Other risk factors consistently associated with a significantly higher risk of cardiovascular events were recorded diagnosis for other CVD-related conditions (i.e., conditions used to define CVD-related death), diabetes-related hospitalization, higher aDCSI score, recorded diagnosis for chronic pulmonary disease, cancer, fluid and electrolyte disorder, or coagulopathy, and having the baseline period prior to 2011 (Table 1). In addition, hypertension was associated with a higher risk of MACE-plus, while deficiency anemia and pulmonary circulation disorders were associated with a higher risk of CVD-related death (Table 1). Being commercially insured was associated with a lower risk of CVD events for all outcomes, being a female was associated with a lower risk of MACE and CVD-related death, and being Hispanic or Asian was associated with a lower risk of CVD-related death (Table 1).

Table 1

Risk models for MACE in the primary prevention population

	MACE	MACE-plus	CVD-related death
Validation
C-statistic, training set	0.72	0.71	0.81
C-statistic, validation set	0.72	0.72	0.81

CVD cardiovascular disease, MACE major adverse cardiovascular events, OR odds ratio, CI confidence interval

* Indicates statistical significance at the 5% level

aA glycated hemoglobin threshold (i.e., < 8%) was added to Healthcare Effectiveness Data and Information Set (HEDIS) measure for Comprehensive Diabetes Care in 2009 and to the Diabetes Recognition Program of the National Committee for Quality Assurance in 2010, which may have impacted CVD risk in patients with diabetes (See [43])

Risk models for MACE in the primary prevention population CVD cardiovascular disease, MACE major adverse cardiovascular events, OR odds ratio, CI confidence interval * Indicates statistical significance at the 5% level aA glycated hemoglobin threshold (i.e., < 8%) was added to Healthcare Effectiveness Data and Information Set (HEDIS) measure for Comprehensive Diabetes Care in 2009 and to the Diabetes Recognition Program of the National Committee for Quality Assurance in 2010, which may have impacted CVD risk in patients with diabetes (See [43]) For the secondary prevention population, 15–20 risk factors were included in the models, and most of them were significantly associated with the study outcomes (Table 2). As for the primary prevention population, older age was the risk factor with the largest impact on the risk of CVD (Table 2). Diabetes-related hospitalization, higher aDCSI score, recorded diagnosis for chronic pulmonary disease or fluid and electrolyte disorders, and having the baseline period prior to 2011 were consistently associated with a significantly higher risk of CVD events (Table 2). In addition, payer type, time since last recorded CVD diagnosis, prior recorded diagnosis for congestive heart failure or iron-deficiency anemia, and ethnicity were identified as predictors of CVD events for all outcomes (Table 2). Prior MI, stroke, and other CVD-related conditions were associated with a higher risk of MACE and MACE-plus, but not of CVD-related death (Table 2). Other risk factors identified for only certain outcomes included race, region, insurance type, recorded diagnosis for mental disorders, obesity, cancer, peripheral vascular disorders, erectile dysfunction, coagulopathy, and pulmonary circulation disorders (Table 2). Interestingly, while being a female was associated with a lower risk of MACE and CVD-related death in the primary prevention population, gender was not associated with an improved predictive accuracy in the secondary prevention population, and thus, was not included as a risk factor in these models (Table 2). Conversely, obesity was not selected as a risk factor in the primary prevention population, whereas it was associated with a lower risk of MACE and CVD-related death in the secondary prevention population.

Table 2

Risk models for MACE in the secondary prevention population

	MACE	MACE-plus	CVD-related death
Validation
C-statistic, training set	0.70	0.72	0.78
C-statistic, validation set	0.70	0.72	0.77

CVD cardiovascular disease, MACE major adverse cardiovascular events, OR odds ratio, CI confidence interval

* Indicates statistical significance at the 5% level

Risk models for MACE in the secondary prevention population CVD cardiovascular disease, MACE major adverse cardiovascular events, OR odds ratio, CI confidence interval * Indicates statistical significance at the 5% level aA glycated hemoglobin threshold (i.e., < 8%) was added to Healthcare Effectiveness Data and Information Set (HEDIS) measure for Comprehensive Diabetes Care in 2009 and to the Diabetes Recognition Program of the National Committee for Quality Assurance in 2010, which may have impacted CVD risk in patients with diabetes (See [43]) The risk models performed well in predicting MACE, MACE-plus, and CVD-related death with C-statistics ranging between 0.70 and 0.81 when considering both the training and validation sets (Tables 1 and 2, Fig. 2). Notably, the highest predictive accuracy was observed for models predicting CVD-related death (Tables 1 and 2; Fig. 2). In addition, the models were well calibrated, with differences between the median predicted risk and median observed risk that did not exceed 0.1% for each of the study outcomes in both the primary and secondary prevention populations (data not shown).

Fig. 2

Receiver operating characteristic curves for the risk models. a Primary prevention population, testing set. b Secondary prevention population, testing set

Receiver operating characteristic curves for the risk models. a Primary prevention population, testing set. b Secondary prevention population, testing set In addition, to further assess the potential impact of using information exclusively derived from medical claims data on performance, predictive models that also included risk factors obtained from pharmacy claims, as well as from medical records and laboratory results were developed. These models included up to 11 additional risk factors, but only showed limited improvements in terms of predictive accuracy, with C-statistics increasing by no more than 0.01 in the training and validation sets for both the primary and secondary prevention populations (data not shown).

Examples

Notably, the risk models can be used to assess CVD risk at different time windows separated by intervals of 6 months over a maximum of 5 years. For instance, the average patient in the primary prevention population—a 67 year old female with an aDCSI score of 1 and recorded diagnosis for hypertension and hyperlipidemia—had a predicted risk of MACE of 1.4% after 1 year, 2.7% after 2 years, and 6.8% after 5 years. The predicted 5-year risk for MACE-plus and CVD-related death were 10.6% and 1.7%, respectively (Table 3: Case 1). For the secondary prevention population, the average patient was a 73 year old male diagnosed with prior congestive heart failure ≥ 12 months ago, other CVD-related conditions, an aDCSI score of 3, recorded diagnosis for hypertension, hyperlipidemia, and infection within the last year. The predicted risk of MACE for that patient were 5.8% after 1 year, 10.5% after 2 years, and 21.8% after 5 years. The predicted 5-year risk for MACE-plus and CVD-related death were 35.2% and 9.9%, respectively (Table 3: Case 2).

Table 3

Predicted risk for the average patient in primary and secondary prevention population

Patient characteristics	Case 1	Case 2
Age	67	73
Gender	Female	Male
Type of insurance	Medicare, other	Medicare, other
Race/ethnicity	White, non-Hispanic	White, non-Hispanic
Region	Mid-West	South
Prior CVD diagnoses	None	CHF (≥ 12 months ago), Other CVD-related condition
aDCSI	1	3
Diabetes-related hospitalizations	No	No
Other recorded diagnoses	Hypertension, hyperlipidemia	Hypertension, hyperlipidemia, infection

aDCSI adapted diabetes complications severity index, CHF congestive heart failure, CVD cardiovascular disease, MACE major adverse cardiovascular event, CI confidence interval

aPredicted risk for a hypothetical patient based on listed characteristics

Predicted risk for the average patient in primary and secondary prevention population aDCSI adapted diabetes complications severity index, CHF congestive heart failure, CVD cardiovascular disease, MACE major adverse cardiovascular event, CI confidence interval aPredicted risk for a hypothetical patient based on listed characteristics

Discussion

This study developed and validated models that predict the risk of adverse cardiovascular events in patients with type 2 diabetes using exclusively information derived from health insurance claims. The main risk factors identified in the primary prevention population included age, diabetes-related hospitalizations, and recorded diagnosis for coagulopathy and chronic pulmonary disease. In the secondary prevention population, age, prior CVD diagnoses, diabetes-related hospitalizations, and recorded diagnosis for chronic pulmonary disease had the most important impact on the risk of having a CVD event. Overall, the models reliably predicted the cardiovascular events for the primary and secondary prevention populations, as illustrated by the C-statistics ranging between 0.70 and 0.81. The finding that age was one of the most important risk factor in predicting cardiovascular events is consistent with findings in previous studies that primarily focused on a diabetes population, such as the UKPDS risk engine [21, 23] and studies that focused on a general population, such as the Framingham Heart Study [12]. However, a major difference between the models developed in the current study and previous ones is that the latter included risk factors derived from laboratory results and medical records [12, 17, 18, 20–22], which are often not available to national quality of care organizations and payers. In contrast, the current study used only information that is readily available from medical claims data. Nonetheless, claims-based information can be used as a proxy for risk factors derived from laboratory results and medical records. For example, blood pressure measurements were not available in claims data, but hypertension-identified based on a recorded diagnosis in a medical claim—was included in the models. Similarly, recorded diagnosis for hyperlipidemia was used as a proxy for high-density lipoprotein cholesterol and low-density lipoprotein cholesterol levels, although it was not included in any models. Yet, certain risk factors identified in the Framingham and UKPDS models tend to be underreported in medical claims, and thus, may have limited predictive accuracy in claims-based models. For example, although diagnosis codes for smoking do exist, this condition is typically underreported in medical claims. Therefore, smoking was not included in any of the claims-based models. However because our study was limited to risk factors available in insurance claims data, certain risk factors identified in other studies were not available for selection in the models. In particular, several studies pointed to a link between glycemic markers and CVD [33-37], but given that HbA1c measures are not available in insurance claims data, this potential risk factor could not be included in the models. This study also found that obesity was associated with a lower risk of MACE and CVD-related death in the secondary prevention population. Several previous studies found obesity to be associated with better survival in patients with chronic or cardiac diseases, hence the term “obesity paradox” to describe this counterintuitive phenomenon [38]. Several explanations have been proposed, including the advantages of fat reserves during illness, biases or confounding in observational studies (e.g., more intensive management), or weight loss due to illness in the reference group [39]. However, due to the observational nature of the current study, no causal relationship can be inferred. Regardless of the aforementioned differences in the risk factors identified in the current study versus previously published models, the models developed here performed well in predicting the risk of cardiovascular events in a population with two well-defined risk factors, namely patients with type 2 diabetes and above 50 years of age. Overall, the predictive accuracies of the models presented in the current study are comparable to those of previously published models. For example, the Framingham risk score, which included diabetes as a predictor, yielded C-statistics of 0.76 and 0.79 for men and women in the general population, respectively [12]. However, when evaluated in an older diabetic cohort and in patients without prior CVD, the Framingham risk score had a C-statistic of 0.65 [19]. The performances of the claims-based models presented here were also comparable to those of previously developed risk models specific to the diabetic population, such as the UKPDS risk engine [21, 40]. Although C-statistics were not reported in the UKPDS original publications, subsequent validations in other diabetic cohorts yielded C-statistics ranging from 0.61 to 0.73 [19, 41]. The ADVANCE model, developed in a population of diabetic patients at risk of cardiovascular events similar to the secondary prevention population in this study, also presented comparable C-statistics of 0.69-0.70 [22]. Moreover, several other multivariate risk models were published and reported C-statistics ranging between 0.64 and 0.70 [17, 18, 20]. A comprehensive external validation study would be needed to evaluate the performance of the different models on the same cohort of patients [42]. The Framingham and UKPDS models were not developed and tested for patients with a prior history of CVD (i.e., the secondary prevention population), meaning that their predictive accuracy may be lower in this subpopulation [12, 21]. Therefore, another advantage of the models developed in the current study over several previous ones is their ability to predict CVD risk in patients with prior history of CVD, who represented almost a quarter of the sample population. More generally, the reliability of this claims-based approach is perhaps best illustrated by the limited incremental predictive accuracy conferred by the additional inclusion of variables derived from medical records or laboratory results. In light of the HEDIS performance measure that targets hospitalization for potentially preventable complications, rationally allocating healthcare resources to patients with type 2 diabetes at higher risk of cardiovascular complications may help healthcare providers meet quality of care standards, and lead to reductions in morbidity, mortality, and cost savings. With growing evidence suggesting that certain types of diabetes treatments—such as SGLT2 inhibitors or GLP-1 receptor agonists—may mitigate cardiovascular risk in addition to improving glycemic control, the potential dual purpose of these diabetes medications could be considered-despite their higher cost—to optimize treatment decisions in patients with type 2 diabetes at high risk of CVD [24-27]. Patients receiving these game-changing treatments were excluded from the present study due to the potential for indication bias: the use of SGLT2 inhibitors or GLP1 receptor agonists could effectively reduce the risk of CVD, but may appear as risk factors associated with a higher risk of CVD if these agents are preferentially prescribed to higher-risk patients. Such counterintuitive phenomena are common in observational studies. Another potential clinical application of the models developed here would be to identify patients at high risk of CVD events within a certain time window in order to provide preventive care. The threshold used for this high-risk group could be rationally determined using the risk that maximizes the sum of the model sensitivity and specificity. For example, using this method, the high-risk threshold in the primary prevention population would be 2.5%, 3.5%, and 1.0% for MACE, MACE-plus, and CVD-related death, respectively (sensitivity ranging from 67 to 73%, and specificity ranging from 67 to 76%). In the sample population used in the current study, applying these thresholds would result in approximately one out of three patients classified at high-risk of having MACE or MACE-plus within a 1-year window, and one out of four patients at high-risk of CVD-related death. In the secondary prevention population, the same thresholds would be 12.5%, 18.0%, and 5.0% for MACE, MACE-plus, and CVD-related death, respectively (sensitivity ranging from 62 to 78%, and specificity ranging from 66 to 72%), resulting in approximately one out of three patients with a high-risk of having any cardiovascular event within a 1-year window.

Limitations

The current study is subject to a few limitations. First, the identification of study outcomes was based on definitional algorithms using health insurance claims data that have not been fully validated, which could lead to the misclassification of outcomes. Second, patients may have experienced cardiovascular events prior to the start of data availability, and may have been misclassified in the primary prevention population. Third, a recorded diagnosis code on a medical claim is not an attestation that the patient has the condition, because the code may represent a rule-out diagnosis or may be recorded incorrectly. Fourth, risk predictions beyond 60 months post-index should be interpreted with caution as a limited number of patients had an at-risk period of such duration. Moreover, risk predictions over longer periods may be confounded by changes in therapeutic strategies. Despite these limitations, healthcare claims are a valuable resource to develop such models. Indeed, the large sample size typically available in claims database prevents over-fitting the models to a specific data set, thereby increasing their external validity, as illustrated by the negligible decrease in predictive accuracy observed within the validation set compared to the training set. Future studies are needed to externally validate the model in a distinct population or database. Finally, it should be noted that the risk models developed aimed at identifying patients at risk of CVD events, no causal inference can be drawn from this model based on observational data.

Conclusions

In summary, this study developed risk models that could reliably identify patients with type 2 diabetes at risk of MACE, MACE-plus, and CVD-related death based on information available in health insurance claims. Ultimately, stakeholders—such as quality of care organizations and payers—may use these models to identify diabetic patients at high risk of cardiovascular events and potentially improve their clinical management, thereby preventing a significant part of the disease burden and associated costs. Additional file 1. Study design. Additional file 2. Definition of outcomes and risk factors. Additional file 3. Risk factors among patients with and without any major adverse cardiovascular events during the at-risk period (training set).

38 in total

1. Association of Cardiometabolic Multimorbidity With Mortality.

Authors: Emanuele Di Angelantonio; Stephen Kaptoge; David Wormser; Peter Willeit; Adam S Butterworth; Narinder Bansal; Linda M O'Keeffe; Pei Gao; Angela M Wood; Stephen Burgess; Daniel F Freitag; Lisa Pennells; Sanne A Peters; Carole L Hart; Lise Lund Håheim; Richard F Gillum; Børge G Nordestgaard; Bruce M Psaty; Bu B Yeap; Matthew W Knuiman; Paul J Nietert; Jussi Kauhanen; Jukka T Salonen; Lewis H Kuller; Leon A Simons; Yvonne T van der Schouw; Elizabeth Barrett-Connor; Randi Selmer; Carlos J Crespo; Beatriz Rodriguez; W M Monique Verschuren; Veikko Salomaa; Kurt Svärdsudd; Pim van der Harst; Cecilia Björkelund; Lars Wilhelmsen; Robert B Wallace; Hermann Brenner; Philippe Amouyel; Elizabeth L M Barr; Hiroyasu Iso; Altan Onat; Maurizio Trevisan; Ralph B D'Agostino; Cyrus Cooper; Maryam Kavousi; Lennart Welin; Ronan Roussel; Frank B Hu; Shinichi Sato; Karina W Davidson; Barbara V Howard; Maarten J G Leening; Maarten Leening; Annika Rosengren; Marcus Dörr; Dorly J H Deeg; Stefan Kiechl; Coen D A Stehouwer; Aulikki Nissinen; Simona Giampaoli; Chiara Donfrancesco; Daan Kromhout; Jackie F Price; Annette Peters; Tom W Meade; Edoardo Casiglia; Debbie A Lawlor; John Gallacher; Dorothea Nagel; Oscar H Franco; Gerd Assmann; Gilles R Dagenais; J Wouter Jukema; Johan Sundström; Mark Woodward; Eric J Brunner; Kay-Tee Khaw; Nicholas J Wareham; Eric A Whitsel; Inger Njølstad; Bo Hedblad; Sylvia Wassertheil-Smoller; Gunnar Engström; Wayne D Rosamond; Elizabeth Selvin; Naveed Sattar; Simon G Thompson; John Danesh
Journal: JAMA Date: 2015-07-07 Impact factor: 56.272

2. Updating and validating the Charlson comorbidity index and score for risk adjustment in hospital discharge abstracts using data from 6 countries.

Authors: Hude Quan; Bing Li; Chantal M Couris; Kiyohide Fushimi; Patrick Graham; Phil Hider; Jean-Marie Januel; Vijaya Sundararajan
Journal: Am J Epidemiol Date: 2011-02-17 Impact factor: 4.897

3. Comparison of baseline and repeated measure covariate techniques in the Framingham Heart Study.

Authors: L A Cupples; R B D'Agostino; K Anderson; W B Kannel
Journal: Stat Med Date: 1988 Jan-Feb Impact factor: 2.373

4. Diabetes mellitus, fasting blood glucose concentration, and risk of vascular disease: a collaborative meta-analysis of 102 prospective studies.

Authors: N Sarwar; P Gao; S R Kondapally Seshasai; R Gobin; S Kaptoge; E Di Angelantonio; E Ingelsson; D A Lawlor; E Selvin; M Stampfer; C D A Stehouwer; S Lewington; L Pennells; A Thompson; N Sattar; I R White; K K Ray; J Danesh
Journal: Lancet Date: 2010-06-26 Impact factor: 202.731

5. New Zealand Diabetes Cohort Study cardiovascular risk score for people with Type 2 diabetes: validation in the PREDICT cohort.

Authors: Tom Robinson; C Raina Elley; Sue Wells; Elizabeth Robinson; Tim Kenealy; Romana Pylypchuk; Dale Bramley; Bruce Arroll; Sue Crengle; Tania Riddell; Shanthi Ameratunga; Patricia Metcalf; Paul L Drury
Journal: J Prim Health Care Date: 2012-09-01

6. The obesity paradox in acute coronary syndrome: a meta-analysis.

Authors: Jacek Niedziela; Bartosz Hudzik; Natalia Niedziela; Mariusz Gąsior; Marek Gierlotka; Jarosław Wasilewski; Krzysztof Myrda; Andrzej Lekston; Lech Poloński; Piotr Rozentryt
Journal: Eur J Epidemiol Date: 2014-10-30 Impact factor: 8.082

7. The influence of baseline risk on the relation between HbA1c and risk for new cardiovascular events and mortality in patients with type 2 diabetes and symptomatic cardiovascular disease.

Authors: Sophie H Bots; Yolanda van der Graaf; Hendrik M W Nathoe; Gert Jan de Borst; Jaap L Kappelle; Frank L J Visseren; Jan Westerink
Journal: Cardiovasc Diabetol Date: 2016-07-19 Impact factor: 9.951

8. Cardiovascular disease in type 2 diabetes from population to man to mechanisms: the Kelly West Award Lecture 2008.

Authors: Markku Laakso
Journal: Diabetes Care Date: 2010-02 Impact factor: 19.112

9. Risk prediction of cardiovascular disease in type 2 diabetes: a risk equation from the Swedish National Diabetes Register.

Authors: Jan Cederholm; Katarina Eeg-Olofsson; Björn Eliasson; Björn Zethelius; Peter M Nilsson; Soffia Gudbjörnsdottir
Journal: Diabetes Care Date: 2008-06-30 Impact factor: 17.152

10. Liraglutide and Cardiovascular Outcomes in Type 2 Diabetes.

Authors: Steven P Marso; Gilbert H Daniels; Kirstine Brown-Frandsen; Peter Kristensen; Johannes F E Mann; Michael A Nauck; Steven E Nissen; Stuart Pocock; Neil R Poulter; Lasse S Ravn; William M Steinberg; Mette Stockner; Bernard Zinman; Richard M Bergenstal; John B Buse
Journal: N Engl J Med Date: 2016-06-13 Impact factor: 176.079

8 in total

1. Development and validation of risk prediction models for stroke and mortality among patients with type 2 diabetes in northern China.

Authors: X Shao; H Liu; F Hou; Y Bai; Z Cui; Y Lin; X Jiang; P Bai; Y Wang; Y Zhang; C Lu; H Liu; S Zhou; P Yu
Journal: J Endocrinol Invest Date: 2022-08-16 Impact factor: 5.467

2. Risk of early mortality and cardiovascular disease in type 1 diabetes: a comparison with type 2 diabetes, a nationwide study.

Authors: You-Bin Lee; Kyungdo Han; Bongsung Kim; Seung-Eun Lee; Ji Eun Jun; Jiyeon Ahn; Gyuri Kim; Sang-Man Jin; Jae Hyeon Kim
Journal: Cardiovasc Diabetol Date: 2019-11-16 Impact factor: 9.951

3. Cholesterol levels and development of cardiovascular disease in Koreans with type 2 diabetes mellitus and without pre-existing cardiovascular disease.

Authors: Mee Kyoung Kim; Kyungdo Han; Han Na Joung; Ki-Hyun Baek; Ki-Ho Song; Hyuk-Sang Kwon
Journal: Cardiovasc Diabetol Date: 2019-10-22 Impact factor: 9.951

4. Performance assessment across different care settings of a heart failure hospitalisation risk-score for type 2 diabetes using administrative claims.

Authors: Alessandro Battaggia; Angelo Avogaro; Gian Paolo Fadini; Barbara Di Camillo; Alessandro Guazzo; Enrico Longato; Mario Luca Morieri; Giovanni Sparacino; Bruno Franco-Novelletto; Maurizio Cancian; Massimo Fusello; Lara Tramontan
Journal: Sci Rep Date: 2022-05-11 Impact factor: 4.996

5. MicroRNA-134-5p and the Extent of Arterial Occlusive Disease Are Associated with Risk of Future Adverse Cardiac and Cerebral Events in Diabetic Patients Undergoing Carotid Artery Stenting for Symptomatic Carotid Artery Disease.

Authors: Rafał Badacz; Tadeusz Przewłocki; Piotr Pieniążek; Agnieszka Rosławiecka; Paweł Kleczyński; Jacek Legutko; Krzysztof Żmudka; Anna Kabłak-Ziembicka
Journal: Molecules Date: 2022-04-12 Impact factor: 4.927

6. Predicting the risk of stroke among patients with type 2 diabetes: a systematic review and meta-analysis of C-statistics.

Authors: Mohammad Ziaul Islam Chowdhury; Fahmida Yeasmin; Doreen M Rabi; Paul E Ronksley; Tanvir C Turin
Journal: BMJ Open Date: 2019-08-30 Impact factor: 2.692

7. Association of ambient air pollution with cardiovascular disease risks in people with type 2 diabetes: a Bayesian spatial survival analysis.

Authors: Pei-Fang Su; Fei-Ci Sie; Chun-Ting Yang; Yu-Lin Mau; Shihchen Kuo; Huang-Tz Ou
Journal: Environ Health Date: 2020-11-05 Impact factor: 5.984

8. Predictive Risk Models to Identify Patients at High-Risk for Severe Clinical Outcomes With Chronic Kidney Disease and Type 2 Diabetes.

Authors: Richard Sheer; Radhika Nair; Margaret K Pasquale; Thomas Evers; Meghan Cockrell; Alain Gay; Rakesh Singh; Niklas Schmedt
Journal: J Prim Care Community Health Date: 2022 Jan-Dec

8 in total