| Literature DB >> 33825685 |
Márcio Luís Moreira De Souza1, Gabriel Ayres Lopes2, Alexandre Castelo Branco3, Jessica K Fairley4, Lucia Alves De Oliveira Fraga1.
Abstract
BACKGROUND: According to the World Health Organization, achieving targets for control of leprosy by 2030 will require disease elimination and interruption of transmission at the national or regional level. India and Brazil have reported the highest leprosy burden in the last few decades, revealing the need for strategies and tools to help health professionals correctly manage and control the disease.Entities:
Keywords: Python; R; apps; artificial intelligence; leprosy; mHealth; random forest; shinyApp
Year: 2021 PMID: 33825685 PMCID: PMC8060869 DOI: 10.2196/23718
Source DB: PubMed Journal: JMIR Mhealth Uhealth ISSN: 2291-5222 Impact factor: 4.773
Figure 1Flow diagram summarizing the data-processing and app-building steps. SINAN: Sistema de Informação Nacional de Agravos de Notificação (National Notifiable Diseases Information System); DATASUS: DATASUS: Sistema Único de Saúde (Unified Health System) data portal; RF: random forest; csv: Comma Separated Value.
Exclusion criteria and justifications.
| Exclusion criteria | Justifications |
| Columns with more than 25,000 “NA” (not available) | The objective was to remove variables that many professionals have not declared the value of, as a large amount of missing data may impair processing. The number 25,000 was arbitrarily defined, focusing on not drastically reducing the total amount of data |
| Categorical variable with more than 53 input possibilities | R shows an alert when a categorical variable with more than 53 input possibilities is being used, given that the greater the number of input possibilities, the smaller the meaning of each input to the model |
| Variables that may induce a result | Some variables imply an operational classification (eg, “g-MB” therapeutic scheme implies that the patient has a case of multibacillary leprosy), causing bias to the model |
| Variables with no apparent correlation with the prediction. | Boruta [ |
| Redundant variables | Redundant variables do not provide additional information to the model, and therefore there is no reason to keep both. An analysis using Python showed that some variables had almost 100% correspondence with another (eg, the state where the case was notified and the state where the patient lives). The Boruta algorithm is also useful to remove redundant variables. |
Figure 2Method to calculate the error rate.
Figure 3Screenshot representing the R ShinyApp input and output flows. The layout of the app may eventually change to improve user experience. ROC: receiver operating characteristic; AUC: area under the curve; FPR: false positive rate.
Number of occurrences per inconsistency.
| Inconsistency | Number of occurrences |
| Operational classification does not match the clinical form | 8545 |
| Indeterminate with disability | 4867 |
| Indeterminate with affected nerves | 3785 |
| Paucibacillary with positive bacilloscopy | 2825 |
| Paucibacillary with more than 5 skin lesions | 938 |
| Patients with more than 18 affected nerves | 93 |
Figure 4Geographic distribution of new annual cases of leprosy in Brazilian municipalities. inhab: inhabitants; NA: not available.
Figure 5Distribution of leprosy cases in the Brazilian states from 2014 to 2018.
Figure 6Geographic leprosy misclassification distribution in Brazilian municipalities. NA: not available.
Figure 7Comparison of algorithms according to the mean of misclassification error (MMCE).
Figure 8Quality in the classification of leprosy cases by artificial intelligence models in Brazilian states.
Importance (in percent) of each variable utilized in the models that represent the highest accuracy.
| Variable | Meaning | Mato Grosso model | Rio Grande do Sul model | Paraná model |
| INCIDÊNCIA | Incidence | 15.0 | 9.5 | 6.1 |
| NU_IDADE_N | Age | 9.3 | 8.1 | 5.5 |
| CS_SEXO | Gender | 1.6 | 1.7 | 1.9 |
| CS_RACA | Race | 2.7 | 6.9 | 1.2 |
| CS_ESCOL_N | Educational level | 4.7 | 6.4 | 3.4 |
| NU_LESOES | Number of skin lesions | 23.6 | 37.0 | 41.9 |
| AVALIA_N | Grade of disability | 4.4 | 4.2 | 5.0 |
| BACILOSCOP | Bacilloscopy | 6.1 | 4.5 | 23.7 |
| CONTREG | Number of household contacts | 5.1 | 5.9 | 2.8 |
| NERVOSAFET | Number of affected nerves | 27.5 | 15.8 | 8.5 |
Quality of the artificial intelligence model applied to the differential diagnosis of paucibacillary and multibacillary leprosy.
| Quality parameter | Mato Grosso model | Rio Grande do Sul model | Paraná model |
| Accuracy | 0.970 | 0.812 | 0.929 |
| Sensitivity | 0.926 | 0.977 | 0.877 |
| Specificity | 0.812 | 0.218 | 0.919 |
| PPVa | 0.936 | 0.803 | 0.972 |
| NPVb | 0.786 | 0.740 | 0.698 |
aPPV: positive predictive value.
bNPV: negative predictive value.