| Literature DB >> 35958865 |
Alvaro D Orjuela-Cañón1, Andrés L Jutinico2, Carlos Awad3, Erika Vergara2, Angélica Palencia3.
Abstract
The use of machine learning (ML) for diagnosis support has advanced in the field of health. In the present paper, the results of studying ML techniques in a tuberculosis diagnosis loop in a scenario of limited resources are presented. Data are analyzed using a tuberculosis (TB) therapy program at a health institution in a main city of a developing country using five ML models. Logistic regression, classification trees, random forest, support vector machines, and artificial neural networks are trained under physician supervision following physicians' typical daily work. The models are trained on seven main variables collected when patients arrive at the facility. Additionally, the variables applied to train the models are analyzed, and the models' advantages and limitations are discussed in the context of the automated ML techniques. The results show that artificial neural networks obtain the best results in terms of accuracy, sensitivity, and area under the receiver operating curve. These results represent an improvement over smear microscopy, which is commonly used techniques to detect TB for special cases. Findings demonstrate that ML in the TB diagnosis loop can be reinforced with available data to serve as an alternative diagnosis tool based on data processing in places where the health infrastructure is limited.Entities:
Keywords: diagnosis support systems; machine learning; machine learning in the loop; relevance analysis; tuberculosis diagnosis
Mesh:
Year: 2022 PMID: 35958865 PMCID: PMC9362992 DOI: 10.3389/fpubh.2022.876949
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Figure 1Schematic of using ML in TB diagnosis. During the TB diagnosis, ML tools are employed to support the decision about the antituberculosis therapy beginning.
Variables collected.
|
|
|
|---|---|
| Sex | Male |
| Female | |
| Age | Numeric: 0–100 |
| Type of population | Homeless |
| Native | |
| Exile | |
| Immigrant | |
| Prison | |
| Violence Victim | |
| Other | |
| City location | Antonio Nariño |
| Barrios Unidos | |
| Bosa | |
| Chapinero | |
| Ciudad Bolívar | |
| Engativá | |
| Fontibón | |
| Kennedy | |
| La Candelaria | |
| Los Mártires | |
| Puente Aranda | |
| Rafael Uribe Uribe | |
| San Cristóbal | |
| Santa Fe | |
| Suba | |
| Teusaquillo | |
| Tunjuelito | |
| Usaquén | |
| Usme | |
| Out of Bogotá City | |
| Unknown | |
| HIV/AIDS status | Yes |
| No | |
| Unknown | |
| Antiretroviral treatment status | Yes |
| No | |
| Unknown |
Sets used for cross-validation.
|
|
|
|
|
|
|---|---|---|---|---|
| 1 | 2017 | 34 | 9 | 43 |
| 2 | 2018 | 52 | 22 | 74 |
| 3 | 2019 | 55 | 10 | 65 |
| Total | 141 | 41 | 182 | |
Results for the ML models.
|
|
|
|
| ||||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| ||
| DT | 2017 | 0.75 | 0.82 | 0.50 | 0.65 | 0.70 | 0.82 | 0.22 | 0.53 |
| 2018 | 0.94 | 1.00 | 0.73 | 0.86 | 0.68 | 0.81 | 0.36 | 0.59 | |
| 2019 | 0.97 | 1.00 | 0.91 | 0.96 | 0.72 | 0.75 | 0.60 | 0.68 | |
| RF | 2017 | 0.81 | 0.83 | 0.72 | 0.73 | 0.70 | 0.79 | 0.33 | 0.60 |
| 2018 | 0.94 | 0.94 | 0.89 | 0.87 | 0.70 | 0.87 | 0.32 | 0.63 | |
| 2019 | 0.89 | 0.90 | 0.87 | 0.85 | 0.82 | 0.85 | 0.60 | 0.77 | |
| LR | 2017 | 0.63 | 0.59 | 0.78 | 0.63 | 0.63 | 0.59 | 0.78 | 0.61 |
| 2018 | 0.71 | 0.71 | 0.68 | 0.63 | 0.65 | 0.73 | 0.45 | 0.62 | |
| 2019 | 0.62 | 0.58 | 0.74 | 0.63 | 0.65 | 0.60 | 0.90 | 0.84 | |
| SVM | 2017 | 0.99 | 0.98 | 1.00 | 0.97 | 0.65 | 0.74 | 0.33 | 0.45 |
| 2018 | 0.94 | 0.92 | 1.00 | 0.86 | 0.61 | 0.75 | 0.27 | 0.56 | |
| 2019 | 0.89 | 0.86 | 0.97 | 0.85 | 0.68 | 0.69 | 0.60 | 0.68 | |
| MLP | 2017 | 0.82 | 0.95 | 0.38 | 0.77 | 0.74 | 0.88 | 0.22 | 0.65 |
| 2018 | 0.87 | 1.00 | 0.26 | 0.93 | 0.74 | 1.00 | 0.14 | 0.65 | |
| 2019 | 0.79 | 0.99 | 0.23 | 0.83 | 0.85 | 0.93 | 0.40 | 0.82 | |
AUC, Area Under Receiver Operative Curve.
ML model results for the three test subsets.
|
|
|
|
|
|
|---|---|---|---|---|
| DT | 0.70 ± 0.040 | 0.79 ± 0.001 | 0.39 ± 0.037 | 0.60 ± 0.005 |
| RF | 0.74 ± 0.069 | 0.83 ± 0.001 | 0.42 ± 0.025 | 0.67 ± 0.008 |
| LR | 0.64 ± 0.011 | 0.64 ± 0.006 | 0.71 ±0.054 | 0.69 ± 0.017 |
| SVM | 0.64 ± 0.001 | 0.72 ± 0.001 | 0.40 ± 0.030 | 0.56 ± 0.013 |
| MLP | 0.77 ±0.004 | 0.93 ±0.003 | 0.25 ± 0.017 | 0.71 ±0.009 |
AUC, Area Under Receiver Operative Curve. The bold values are the highest values for each column.
Best ML model results for the applied metrics and the full data set.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Accuracy | 0.63 | 0.66 | 0.86 | 0.81 | 0.80 |
| Sensitivity | 0.90 | 0.87 | 0.94 | 0.95 | 0.82 |
| Specificity | 0.35 | 0.36 | 0.66 | 0.55 | 0.68 |
The bold values are the highest values for each column.
Figure 2Sensitivity, accuracy, and specificity for all five ML models: (A) Logistic regression; (B) Classification tree; (C) Random forest; (D) Support vector machine; (E) Multilayer perceptron neural network. For all ML models is visualized the effect of using or not each one of the considered variables in terms of sensitivity (blue), specificity (green) and accuracy (orange). There it is possible to see how the metrics change, according to the inclusion or exclusion of the seven variables.
Results for the auto ML models by year.
|
|
|
|
| ||||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| ||
| AutoML | 2017 | 0.86 | 0.85 | 1.00 | 0.92 | 0.79 | 1.00 | 0.00 | 0.50 |
| 2018 | 0.92 | 0.90 | 1.00 | 0.95 | 0.70 | 0.70 | 0.50 | 0.60 | |
| 2019 | 0.91 | 0.92 | 0.88 | 0.90 | 0.83 | 0.94 | 0.46 | 0.70 | |
| TPOT | 2017 | 0.77 | 1.00 | 0 | 0.50 | 0.79 | 1.00 | 0 | 0.50 |
| 2018 | 0.85 | 0.84 | 1.00 | 0.92 | 0.73 | 0.72 | 1.00 | 0.86 | |
| 2019 | 0.74 | 0.74 | 1.00 | 0.87 | 0.84 | 1.00 | 0.00 | 0.50 | |
AUC, Area Under Receiver Operative Curve.
Results for the auto ML models for 3 years.
|
|
|
|
|
|
|---|---|---|---|---|
| AutoML | 0.77 ± 0.004 | 0.88 ± 0.025 | 0.32 ± 0.077 | 0.60 ± 0.010 |
| TPOT | 0.78 ± 0.003 | 0.90 ± 0.026 | 0.33 ± 0.333 | 0.62 ± 0.043 |
AUC, Area Under Receiver Operative Curve.