| Literature DB >> 35407669 |
Juana Pinar-Sanchez1, Pablo Bermejo López2, Julián Solís García Del Pozo3, Jose Redondo-Ruiz4, Laura Navarro Casado5, Fernando Andres-Pretel6, María Luisa Celorrio Bustillo7, Mercedes Esparcia Moreno7, Santiago García Ruiz8, Jose Javier Solera Santos9, Beatriz Navarro Bravo10.
Abstract
The diagnosis of alcohol use disorder (AUD) remains a difficult challenge, and some patients may not be adequately diagnosed. This study aims to identify an optimum combination of laboratory markers to detect alcohol consumption, using data science. An analytical observational study was conducted with 337 subjects (253 men and 83 women, with a mean age of 44 years (10.61 Standard Deviation (SD)). The first group included 204 participants being treated in the Addictive Behaviors Unit (ABU) from Albacete (Spain). They met the diagnostic criteria for AUD specified in the Diagnostic and Statistical Manual of mental disorders fifth edition (DSM-5). The second group included 133 blood donors (people with no risk of AUD), recruited by cross-section. All participants were also divided in two groups according to the WHO classification for risk of alcohol consumption in Spain, that is, males drinking more than 28 standard drink units (SDUs) or women drinking more than 17 SDUs. Medical history and laboratory markers were selected from our hospital's database. A correlation between alterations in laboratory markers and the amount of alcohol consumed was established. We then created three predicted models (with logistic regression, classification tree, and Bayesian network) to detect risk of alcohol consumption by using laboratory markers as predictive features. For the execution of the selection of variables and the creation and validation of predictive models, two tools were used: the scikit-learn library for Python, and the Weka application. The logistic regression model provided a maximum AUD prediction accuracy of 85.07%. Secondly, the classification tree provided a lower accuracy of 79.4%, but easier interpretation. Finally, the Naive Bayes network had an accuracy of 87.46%. The combination of several common biochemical markers and the use of data science can enhance detection of AUD, helping to prevent future medical complications derived from AUD.Entities:
Keywords: alcohol-related disorders; data science; laboratory diagnosis; machine learning; screening
Year: 2022 PMID: 35407669 PMCID: PMC8999878 DOI: 10.3390/jcm11072061
Source DB: PubMed Journal: J Clin Med ISSN: 2077-0383 Impact factor: 4.241
Figure 1Flow chart of the study.
Figure 2Comparison of marital status for both groups, ABU and blood donors.
Figure 3Comparison of the study level for both groups (ABU and blood donors).
Coefficients calculated in logistic regression, for all the variables used, together with the confidence interval and its p-value. Statistically significant ones are marked with *.
| Variable | Coefficient | Confidence Interval (95%) | ||
|---|---|---|---|---|
| Constant | 0.826 | 0.084 | 1.569 | 0.029 |
| Study level | −0.424 | −0.973 | 0.125 | 0.130 |
| Age | −0.191 | −0.746 | 0.364 | 0.499 |
| Albumin | 0.497 | −1.215 | 2.209 | 0.570 |
| Uric acid | −0.057 | −0.635 | 0.521 | 0.846 |
| Basophils | −1.907 | −4.863 | 1.050 | 0.206 |
| Basophils % | 1.898 | −0.766 | 4.562 | 0.163 |
| Total bilirubin | 0.283 | −0.921 | 1.487 | 0.645 |
| Calcium | 0.284 | −0.493 | 1.061 | 0.474 |
| MCHC 1 | −8.259 | −15.893 | −0.625 | 0.034 * |
| Chlorine | 1.029 | 0.320 | 1.737 | 0.004 * |
| Cholesterol | 0.826 | −0.614 | 2.267 | 0.261 |
| Creatinine | −0.608 | −1.322 | 0.106 | 0.095 |
| Eosinophils | −12.068 | −23.030 | −1.106 | 0.031 * |
| Eosinophils % | 11.639 | −3.425 | 26.702 | 0.130 |
| Red blood cells | −2.775 | −10.355 | 4.804 | 0.473 |
| Alkaline phosphatase | 1.125 | 0.244 | 2.006 | 0.012 * |
| Ferritin | 1.872 | 0.268 | 3.477 | 0.022 * |
| Gamma glutamyl transferase | −0.044 | −2.222 | 2.134 | 0.968 |
| Globulins | 0.533 | −1.112 | 2.177 | 0.526 |
| Glucose | −0.483 | −1.082 | 0.115 | 0.114 |
| Aspartate aminotransferase | 0.423 | −1.778 | 2.624 | 0.706 |
| Alanine aminotransferase | 0.503 | −0.554 | 1.559 | 0.351 |
| Haemoglobin | −8.424 | −21.414 | 4.566 | 0.204 |
| Mean corpuscular haemoglobin | 23.651 | 7.540 | 39.762 | 0.004 * |
| Hematocrit | 11.208 | −2.673 | 25.089 | 0.114 |
| HDL- cholesterol | −1.033 | −2.066 | 0.001 | 0.050 |
| Red blood cells distribution width | 0.818 | 0.158 | 1.477 | 0.015 * |
| Platelet distribution width | 0.491 | −0.176 | 1.158 | 0.149 |
| Potassium | 0.636 | 0.014 | 1.259 | 0.045 * |
| Lactate dehydrogenase | 0.412 | −0.193 | 1.016 | 0.182 |
| LDL-cholesterol | 0.310 | −0.869 | 1.489 | 0.607 |
| White blood cells | 205.737 | 15.672 | 395.801 | 0.034 * |
| Lymphocytes | −59.007 | −113.951 | −4.063 | 0.035 * |
| % Lymphocytes | 50.220 | −20.111 | 120.550 | 0.162 |
| % Large Unstained Cells | 17.875 | −9.300 | 45.049 | 0.197 |
| Large Unstained Cells | −20.052 | −42.138 | 2.034 | 0.075 |
| Monocytes | −14.328 | −26.135 | −2.521 | 0.017 * |
| % Monocytes | 12.3017 | −2.676 | 27.280 | 0.107 |
| Myeloperoxidase index | −0.180 | −0.757 | 0.397 | 0.541 |
| Neutrophils | −185.847 | −358.363 | −13.331 | 0.035 * |
| % Neutrophils | 60.0477 | −22.608 | 142.704 | 0.154 |
| Phosphorus | −0.422 | −1.032 | 0.188 | 0.175 |
| Platelets | −0.114 | −0.790 | 0.563 | 0.742 |
| Total Proteins | −1.113 | −2.745 | 0.519 | 0.181 |
| Triglycerides | −0.603 | −1.554 | 0.348 | 0.214 |
| Transferrin | 0.143 | −0.582 | 0.869 | 0.698 |
| Urea | −1.238 | −2.050 | −0.426 | 0.003 * |
| Mean Corpuscular Volume | −22.442 | −36.694 | −8.190 | 0.002 * |
| Mean Platelet Volume | −0.463 | −1.174 | 0.249 | 0.202 |
| Sodium | −1.553 | −2.430 | −0.677 | 0.001 * |
| SEX_woman | −0.278 | −1.021 | 0.464 | 0.462 |
1 red cell mean corpuscular hemoglobin concentration. *: Statistically significant ones.
Coefficients calculated for logistic regression model for ‘risk’ with automatic selection (IWSSr), for the variables used, together with the confidence interval and its p value.
| Variable | Coefficient | Confidence Interval (95%) | ||
|---|---|---|---|---|
| Constant | 0.7079 | 0.263 | 1.153 | 0.002 |
| Mean corpuscular haemoglobin | 1.3679 | 0.944 | 1.791 | 0 |
| Gamma glutamyl transferase | 2.78 | 1.112 | 4.448 | 0.001 |
| Red blood cells distribution width | 1.0657 | 0.663 | 1.469 | 0 |
| Creatinine | −0.7371 | −1.089 | −0.385 | 0 |
| Total bilirubin | 0.4942 | −0.173 | 1.161 | 0.146 |
| Mean Platelet Volume | −0.1804 | −0.479 | 0.118 | 0.236 |
| Large Unstained Cells | 0.4084 | −0.938 | 1.755 | 0.552 |
| HDL-cholesterol | −0.676 | −1.093 | −0.259 | 0.002 |
Figure 4Classification tree. In that figure, the square brackets are not bibliographic references. It shows the generated tree, whose initial root node is with the variable ferritin, and depending on whether its value is greater or less than 136.5, it continues with total proteins (TP), or gamma-glutamyl transferase (GGT), and later, depending on the value of these, it continues toward different variables, to finally reach the different sheets, in the lower part, that predict whether the participant is at risk or not at risk, with respect to risky alcohol consumption.
Figure 5Prediction of alcohol use by 9 biomarkers and the study level using machine learning, obtained by a Bayesian network. BAS = Basophils, CREA = Creatinine, ALP = Alkaline phosphatase, GGT = Gamma glutamyl transferase, MCH = Mean corpuscular haemoglobin, HCT = Hematocrit, RDW = Red blood cells distribution width, LDH = Lactate Dehydrogenase.
Probability table for risk and the discretized variables in the Bayesian network.
| Variable | Value | Risk | No risk |
|---|---|---|---|
| Gamma glutamyl transferase | ≤46.5 | 0.572 | 0.929 |
| >46.5 | 0.428 | 0.071 | |
| Mean corpuscular haemoglobin | <31.3 | 0.25 | 0.8 |
| ≥31.3 | 0.75 | 0.2 | |
| Educational level | 1 | 0.020 | 0.003 |
| 2 | 0.202 | 0.033 | |
| 3 | 0.526 | 0.282 | |
| 4 | 0.122 | 0.245 | |
| 5 | 0.082 | 0.312 | |
| 6 | 0.048 | 0.124 | |
| Basophils | ≤0.01 | 0.009 | 0.107 |
| (0.01–0.015] | 0.077 | 0.027 | |
| (0.015–0.02] | 0.003 | 0.216 | |
| >0.02 | 0.911 | 0.649 | |
| Creatinine | ≤1.025 | 0.871 | 0.598 |
| >1.025 | 0.129 | 0.402 | |
| Alkaline phosphatase | ≤34 | 0.003 | 0.131 |
| (34–84.5] | 0.771 | 0.810 | |
| >54.5 | 0.226 | 0.058 | |
| Hematocrit | ≤47.09 | 0.618 | 0.850 |
| >47.09 | 0.382 | 0.150 | |
| Red blood cells distribution width | ≤13.05 | 0.141 | 0.469 |
| >13.05 | 0.859 | 0.531 | |
| Lactate Dehydrogenase | ≤256.5 | 0.836 | 0.985 |
| >256.5 | 0.164 | 0.015 | |
| Urea | ≤23.5 | 0.415 | 0.070 |
| (23.5–40.5] | 0.519 | 0.566 | |
| >40.5 | 0.066 | 0.364 |
Please note ‘(‘ stands for ‘open range value’ and ‘]’ stands for ‘closed range value’. For instance, (0.01 means any value greater than 0.1 but not equal; and 0.015] means exactly 0.015.
Figure 6Predictive strength of each parameter selected in the Bayesian network.