| Literature DB >> 35064162 |
João Albuquerque1,2, Ana Margarida Medeiros3,4, Ana Catarina Alves3,4, Mafalda Bourbon3,4, Marília Antunes5,6.
Abstract
Familial Hypercholesterolemia (FH) is an inherited disorder of lipid metabolism, characterized by increased low density lipoprotein cholesterol (LDLc) levels. The main purpose of the current work was to explore alternative classification methods to traditional clinical criteria for FH diagnosis, based on several biochemical and biological indicators. Logistic regression (LR), decision tree (DT), random forest (RF) and naive Bayes (NB) algorithms were developed for this purpose, and thresholds were optimized by maximization of Youden index (YI). All models presented similar accuracy (Acc), specificity (Spec) and positive predictive values (PPV). Sensitivity (Sens) and G-mean values were significantly higher in LR and RF models, compared to the DT. When compared to Simon Broome (SB) biochemical criteria for FH diagnosis, all models presented significantly higher Acc, Spec and G-mean values (p < 0.01), and lower negative predictive value (NPV, p < 0.05). Moreover, LR and RF models presented comparable Sens values. Adjustment of the cut-off point by maximizing YI significantly increased Sens values, with no significant loss in Acc. The obtained results suggest such classification algorithms can be a viable alternative to be used as a widespread screening method. An online application has been developed to assess the performance of the LR model in a wider population.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35064162 PMCID: PMC8782861 DOI: 10.1038/s41598-022-05063-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Diagram of overall study design. VUS variant of unknown significance, HoFH homozygous familial hypercholesterolemia, CV cross validation.
Biological and biochemical characteristics of FH and non-FH subjects.
| FH | NA % | non-FH | NA % | ||
|---|---|---|---|---|---|
| n (%) | 104 (36.4) | 182 (63.4) | |||
| n Male (%) | 54 (51.9) | 0 | 70 (38.5) | 0 | 0.04 |
| 9.36 (3.83) | 0 | 9.9 (3.62) | 0 | 0.19 | |
| 2–7 years: n (%) | 33 (31.7) | 40 (22.0) | 0.16 | ||
| 8–12 years: n (%) | 52 (50.0) | 98 (53.8) | |||
| 13–17 years: n (%) | 19 (18.3) | 44 (24.2) | |||
| 0.5 (1.2) | 6.7 | 0.76 (1.33) | 10.4 | 0.1 | |
| Overweigh: n (%) | 33 (31.7) | 78 (42.9) | 0.08 | ||
| TC: mean (sd) | 272.0 (46.0) | 0 | 230.0 (33.0) | 0 | < 0.01 |
| LDLc: mean (sd) | 203.6 (44.0) | 0 | 153.4 (27.7) | 0 | < 0.01 |
| HDLc: mean (sd) | 52.0 (12.5) | 0 | 59.9 (15.6) | 0 | < 0.01 |
| TG: mean (sd) | 73.2 (32.8) | 0 | 91.8 (43.4) | 0 | < 0.01 |
| ApoAI: mean (sd) | 134.7 (22.3) | 2.9 | 155.1 (27.8) | 2.7 | < 0.01 |
| ApoB: mean (sd) | 133.0 (28.0) | 2.9 | 101.0 (25.0) | 1.6 | < 0.01 |
| Lp(a): mean (sd) | 38.1 (40.6) | 10.6 | 56.1 (65.7) | 5.5 | 0.17 |
| 21 (20.2) | 74 (40.7) | < 0.01 | |||
FH familial hypercholesterolemia, NA not available, BMI body mass index, TC total cholesterol, LDLc low density lipoprotein cholesterol, HDLc high density lipoprotein cholesterol, TG triglycerides, Apo apolipoprotein, Lp(a) Lipoprotein(a), sd standard deviation.
Final model fit for LR model.
| SE | Wald | OR | 95% CI | |||
|---|---|---|---|---|---|---|
| (Intercept) | − 0.45 | 1.52 | − 0.30 | 0.77 | 0.64 | (0.03–12.51) |
| LDLc | 0.05 | 0.01 | 7.10 | < 0.01 | 1.05 | (1.03–1.06) |
| TG | − 0.02 | 0.01 | − 3.72 | < 0.01 | 0.98 | (0.97–0.99) |
| ApoAI | − 0.04 | 0.01 | − 4.67 | < 0.01 | 0.96 | (0.95–0.98) |
| − 0.89 | 0.40 | − 2.24 | 0.02 | 0.41 | (0.18–0.88) | |
| Male sex | 0.74 | 0.37 | 2.01 | 0.04 | 2.09 | (1.03–4.33) |
| − 0.73 | 0.39 | − 1.86 | 0.06 | 0.48 | (0.22–1.03) |
SE standard error, OR odds ratio, CI confidence interval, LDLc low density lipoprotein cholesterol, TG triglycerides, Apo apolipoprotein, Lp(a) Lipoprotein(a).
Figure 2Decision tree model. At each node, it is represented the biochemical indicator used to divide the sample, the respective cut-off value, and the way the original sample is divided throughout the tree. FH familial hypercholesterolemia, LDLc low density lipoprotein cholesterol, in mg/dL, TG triglycerides, in mg/dL, ApoAI apolipoprotein AI, in mg/dL, pos positive cases.
Figure 3Tenfold cross validation results concerning the several OC, for all classification algorithms. The dashed line represents the value obtained using Simon Broome (SB) biochemical criteria. LR logistic regression, DT decision tree, RF random forest, NB Naïve Bayes.
Pairwise comparisons tests among different classification methods, and between these and SB criteria, regarding the several OC.
| LR–DT | ||||||
| RF–DT | ||||||
| SB–LR | ||||||
| SB–DT | ||||||
| SB–RF | ||||||
| SB–NB |
Acc accuracy, Sens sensitivity, Spec specificity, PPV positive predictive value, NPV negative predictive value, LR logistic regression, DT decision tree, RF random forest, NB naive Bayes Non reported pairwise comparisons did not present any significant difference for p < 0.05.
*Still significant for p < 0.05 after applying Bonferroni correction for multiple comparisons.
Comparison of operating characteristics mean values, as obtained using the default cut-off value c = 0.5 or the value obtained by maximizing YI, among each classification method.
| LR | RF | NB | |||||||
|---|---|---|---|---|---|---|---|---|---|
| 0.84 | 0.84 | 0.83 | 0.84 | 0.83 | 0.67 | 0.84 | 0.83 | 1.0 | |
| 0.81 | 0.84 | 0.20 | 0.80 | 0.83 | 0.13 | 0.80 | 0.82 | 0.11 | |
| 0.75 | 0.84 | 0.04 | 0.71 | 0.86 | 0.01* | 0.70 | 0.79 | 0.02 | |
| 0.90 | 0.85 | 0.06 | 0.91 | 0.81 | 0.01* | 0.92 | 0.86 | 0.02 | |
| 0.82 | 0.79 | 0.06 | 0.85 | 0.72 | 0.02 | 0.84 | 0.77 | 0.02 | |
| 0.86 | 0.90 | 0.04 | 0.84 | 0.91 | 0.01* | 0.84 | 0.88 | 0.02 | |
YI Youden index, Acc accuracy, Sens sensitivity, Spec specificity, PPV positive predictive value, NPV negative predictive value, LR logistic regression, RF random forest, NB naive Bayes.
*Still significant for p < 0.05 after applying Bonferroni correction for multiple comparisons.