| Literature DB >> 35538103 |
Sebastião Rogério da Silva Neto1, Thomás Tabosa de Oliveira1, Igor Vitor Teixiera1, Leonides Medeiros Neto1, Vanderson Souza Sampaio2,3, Theo Lynn4, Patricia Takako Endo5.
Abstract
One of the main categories of Neglected Tropical Diseases (NTDs) are arboviruses, of which Dengue and Chikungunya are the most common. Arboviruses mainly affect tropical countries. Brazil has the largest absolute number of cases in Latin America. This work presents a unified data set with clinical, sociodemographic, and laboratorial data on confirmed patients of Dengue and Chikungunya, as well as patients ruled out of infection from these diseases. The data is based on case notification data submitted to the Brazilian Information System for Notifiable Diseases, from Portuguese Sistema de Informação de Agravo de Notificação (SINAN), from 2013 to 2020. The original data set comprised 13,421,230 records and 118 attributes. Following a pre-processing process, a final data set of 7,632,542 records and 56 attributes was generated. The data presented in this work will assist researchers in investigating antecedents of arbovirus emergence and transmission more generally, and Dengue and Chikungunya in particular. Furthermore, it can be used to train and test machine learning models for differential diagnosis and multi-class classification.Entities:
Mesh:
Year: 2022 PMID: 35538103 PMCID: PMC9090806 DOI: 10.1038/s41597-022-01312-7
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
General and disease baseline characteristics.
| Variables | Total N = 6732542 | Dengue N = 4307513 | Chikungunya N = 325000 | Others N = 2100029 |
|---|---|---|---|---|
| Gender Women, % | 3731577/6732542 (55.4) | 2403184/4307513 (55.8) | 194780/325000 (59.9) | 1133495/2100029 (54) |
| Age, Mean (SD) | 32 (18) | 33 (18) | 37 (20) | 31 (18) |
| Race, (%) | ||||
| White | 1,840,878 (27.3) | 1,200,564 (27.9) | 39,443 (12.1) | 600,871 (28.6) |
| Black | 243,673 (3.6) | 155,374 (3.6) | 14,505 (4.5) | 73,794 (3.5) |
| Yellow | 48,140 (0.7) | 30,124 (0.7) | 3,998 (1.2) | 14,018 (0.7) |
| Admixed | 2,277,168 (33.8) | 1,341,361 (31.1) | 170,074 (52.3) | 765,733 (36.5) |
| Indigenous | 15,484 (0.2) | 10,246 (0.2) | 691 (0.2) | 4,547 (0.2) |
| Missing/ignored | 2,307,199 (34.2) | 1,569,844 (36.4) | 96,289 (29,6) | 641,066 (30.5) |
| Pregnant, (%) | ||||
| 1st Quarter | 13,641 (0.2) | 7,915 (0.2) | 910 (0.3) | 4,816 (0.2) |
| 2nd Quarter | 17,463 (0.3) | 10,007 (0.2) | 1,505 (0.5) | 5,951 (0.3) |
| 3rd Quarter | 14,223 (0.2) | 7,951 (0.2) | 1,204 (0.4) | 5,068 (0.2) |
| Missing/ignored | 6,687,215 (99.3) | 4,281,640 (99.3) | 321,381 (99) | 2,084,194 (99.3) |
| Educational Degree, (%) | ||||
| Elementary School | 587,216 (5.3) | 229,742 (5.3) | 15,434 (4.8) | 116,632 (5.5) |
| Middle School | 631,664 (9.3) | 406,366 (9.4) | 24,394 (7.5) | 200,904 (9.6) |
| High School | 1,093,285 (16.2) | 698,230 (16.2) | 37,686 (11.6) | 357,369 (17.1) |
| College | 265,913 (3.9) | 168,808 (3.9) | 8,495 (2.7) | 88,610 (4.2) |
| Missing/ignored | 4,342,024 (64.5) | 2,781,507 (64.6) | 236,702 (72.8) | 1,323,815 (63) |
| Fever, (%) | 2,508,024 (37.3) | 1,714,334 (39.8) | 139 (<0.1) | 793,551 (37.8) |
| Myalgia, (%) | 2,289,404 (34) | 1,595,876 (37) | 117 (<0.1) | 693,411 (33) |
| Headache, (%) | 2,325,434 (34.5) | 1,611,029 (37.4) | 115 (<0.1) | 714,290 (34) |
| Rash, (%) | 621,048 (9.2) | 466,788 (10.8) | 49 (<0.1) | 154,211 (7.3) |
| Vomit, (%) | 632,864 (9.4) | 1,595,876 (37) | 117 (<0.1) | 693,411 (33) |
| Headache, (%) | 2,325,434 (34.5) | 1,611,029 (37.4) | 115 (<0.1) | 714,290 (34) |
| Rash, (%) | 621,048 (9.2) | 466,788 (10.8) | 49 (<0.1) | 154,211 (7.3) |
| Vomit, (%) | 632,864 (9.4) | 438,160 (10.2) | 42 (<0.1) | 194,662 (9.3) |
| Nausea, (%) | 958,826 (14.2) | 691,305 (16) | 58 (<0.1) | 267,463 (12.7) |
| Back pain, (%) | 754,865 (11.2) | 545,952 (12.7) | 54 (<0.1) | 208,859 (9.9) |
| Conjunctivitis, (%) | 90,528 (1.3) | 64,807 (1.5) | 13 (<0.1) | 25,708 (1.2) |
| Arthritis, (%) | 288,109 (4.3) | 214,337 (5) | 30 (<0.1) | 73,742 (3.5) |
| Arthralgia, (%) | 635,375 (9.4) | 451,362 (10.5) | 58 (<0.1) | 183,955 (8.8) |
| Petechiae, (%) | 246,220 (3.7) | 187,214 (4.3) | 26 (<0.1) | 58,980 (2.8) |
| Tourniquet test, (%) | 119,836 (1.8) | 97,642 (2.3) | 5 (<0,1) | 22,189 (1.1) |
| Retro-orbital pain, (%) | 962,044 (14.3) | 730,885 (17) | 46 (<0.1) | 231,113 (11) |
| Diabetes, (%) | 63,657 (0.9) | 45,088 (1) | 8 (<0.1) | 18,561 (0.9) |
| Hematological disease, (%) | 12,701 (0.2) | 8,751 (0.2) | 1 (<0.1) | 3,949 (0.2) |
| Liver disease, (%) | 13,595 (0.2) | 9,351 (0.2) | 1 (<0.1) | 4,243 (0.2) |
| Kidney disease, (%) | 11,311 (0.2) | 7,920 (0.2) | 1 (<0.1) | 3,390 (0.2) |
| Hypertension, (%) | 156,779 (2.3) | 112,685 (2.6) | 12 (<0.1) | 44,082 (2.1) |
| Peptic acid disease, (%) | 14,842 (0.2) | 10,258 (0.2) | 2 (<0.1) | 4,582 (0.2) |
| Autoimmune disease, (%) | 11,318 (0.2) | 8,031 (0.2) | 0 (0) | 3,287 (0.2) |
| Test Results (IgM) Dengue, (%) | ||||
| Positive | 28,842 (0.4) | 26,551 (0.6) | 4 (<0.1) | 2,287 (0,.) |
| Negative | 49,175 (0.7) | 19,659 (0.5) | 13 (<0.1) | 29,503 (1.4) |
| Inconclusive | 13,381 (0.2) | 7,387 (0.2) | 2 (<0.1) | 5,992 (0.3) |
| Not performed | 6,641,144 (98.6) | 4,253,916 (98.8) | 324,981 (>99.9) | 2,062,247 (98.2) |
| Test Result ELISA, (%) | ||||
| Positive | 21,625 (0.3) | 19,684 (0.5) | 1 (<0.1) | 1,940 (0.1) |
| Negative | 137,247 (2) | 51,030 (1.2) | 1 (<0.1) | 86,216 (4.1) |
| Inconclusive | 2,659 (<0.1) | 1,637 (<0.1) | 0 (0) | 1,022 (<0.1) |
| Not performed | 6,571,011 (97.6) | 4,235,162 (98.3) | 324,998 (>99.9) | 2,010,851 (95.8) |
| Test Result Viral Isolation, (%) | ||||
| Positive | 207 (<0.1) | 191 (<0.1) | 0 (0) | 16 (<0.1) |
| Negative | 2,963 (<0.1) | 2,036 (<0.1) | 4 (<0.1) | 923 (<0.1) |
| Inconclusive | 909 (<0.1) | 580 (<0.1) | 0 (0) | 329 (<0.1) |
| Not performed | 6,728,463 (99.9) | 4,304,706 (99.9) | 324,996 (>99.9) | 2,098,761 (99.9) |
| RT-PCR Exam Result, (%) | ||||
| Positive | 670 (<0.1) | 634 (<0.1) | 0 (0) | 36 (<0.1) |
| Negative | 4,700 (0.1) | 2,802 (0.1) | 6 (<0.1) | 1,892 (0.1) |
| Inconclusive | 1,176 (<0.1) | 731 (<0.1) | 0 (0) | 445 (<0.1) |
| Not performed | 6,725,996 (99.9) | 4,303,346 (99.9) | 324,994 (>99.9) | 2,097,656 (99.9) |
| Histopathology Test Result, (%) | ||||
| Positive | 445 (<0.1) | 404 (<0.1) | 0 (0) | 41 (<0.1) |
| Negative | 1,677 (<0.1) | 1,002 (<0.1) | 1 (<0.1) | 674 (<0.1) |
| Inconclusive | 914 (<0.1) | 566 (<0.1) | 0 (0) | 348 (<0.1) |
| Not performed | 6,729,506 (>99.9) | 4,305,541 (>99.9) | 324,999 (>99.9) | 2,098,966 (99.9) |
| Immunohistochemistry Test Result, (%) | ||||
| Positive | 341 (<0.1) | 309 (<0.1) | 0 (0) | 32 (<0.1) |
| Negative | 2,165 (<0.1) | 1,360 (<0.1) | 1 (<0.1) | 804 (<0.1) |
| Inconclusive | 2,336 (<0.1) | 1,519 (<0.1) | 0 (0) | 817 (<0.1) |
| Not performed | 6,727,700 (99.9) | 4,304,325 (99.9) | 324,999 (>99.9) | 2,098,376 (99.9) |
| Patient hospitalized, (%) | 132,904 (2) | 96,790 (2.2) | 10 (<0.1) | 36,104 (1.7) |
| Leukopenia, (%) | 135,959 (2) | 109,099 (2.5) | 1 (<0.1) | 26,859 (1.3) |
Notes: (a) All data presented refers to suspected cases; (b) The classifications presented here here are in line with the Brazilian Ministry of Health guidelines; and (c) RT-PCR Exam Result refers to each specific virus defined in the respective column.
Fig. 1Pre-processing steps performed to build the final data set.
Attributes removed after preprocessing.
| Attributes removed | |||||
|---|---|---|---|---|---|
| ID_OCUPA_N | DT_ALRM | DT_VIRAL | GRAV_METRO | DT_OBITO | PETEQUIAS |
| DT_CHIK_S1 | GRAV_PULSO | DT_PCR | GRAV_SANG | ALRM_HIPOT | HEMATURA |
| DT_CHIK_S2 | GRAV_CONV | SOROTIPO | GRAV_AST | ALRM_PLAQ | SANGRAM |
| DT_PRNT | GRAV_ENCH | DT_INTERNA | GRAV_MIOC | ALRM_VOM | LACO_N |
| RES_CHIKS1 | GRAV_INSUF | GENGIVO | GRAV_CONSC | ALRM_SANG | PLASMATICO |
| RES_CHIKS2 | GRAV_TAQUI | MUNICIPIO | GRAV_ORGAO | ALRM_HEMAT | EVIDENCIA |
| RESUL_PRNT | GRAV_EXTRE | COUFINF | DT_GRAV | ALRM_ABDOM | PLAQ_MENOR |
| DT_SORO | GRAV_HIPOT | COPAISINF | MANI_HEMOR | ALRM_LETAR | COMPLICA |
| DT_NS1 | GRAV_HEMAT | COMUNINF | EPISTAXE | ALRM_HEPAT | |
| DT_VIRAL | GRAV_MELEN | DOENCA_TRA | CLINC_CHIK | METRO | |
| TP_SISTEMA | CS_FLXRET | TP_NOT | CRITERIO | ALRM_LIQ | |
Fig. 2Number of records in the data set by category (Dengue, Chikungunya, Discarded/Inconclusive) in Brazil per year.
Fig. 3Age structure of individuals in cases of Dengue, Chikungunya and Inconclusive.
Fig. 4Occurrence of confirmed cases of Dengue by Brazilian state.
Fig. 6Occurrence of discarded/inconclusive cases of Dengue and Chikungunya by Brazilian state.
Fig. 5Occurrence of confirmed cases of Chikungunya by Brazilian state.
Fig. 7Attributes in the final data set.
Socio-demographic data.
| Attribute | Description |
|---|---|
| ID_AGRAVO | ICD disease code |
| DT_NOTIFIC | Notification date |
| SEM_NOT | Epidemiological notification week |
| NU_ANO | Notification year |
| SG_UF_NOT | Acronym of the State of the health unit |
| ID_MUNICIP | City of Health Unit (IBGE Code) |
| ID_REGIONA | Health care regional code (where the health unit or other reporting source is located) |
| ID_UNIDADE | Health facility code |
| DT_SIN_PRI | Date of onset of severe symptoms |
| SEM_PRI | Epidemiological week of onset of symptoms |
| DT_NASC | Patient date of birth |
| NU_IDADE_N | Patient age |
| CS_SEXO | Patient sex |
| CS_GESTANT | Gestational Age of the Patient (Quarter) in case CS_SEXO = F |
| CS_RACA | Patient Race |
| CS_ESCOL_N | Patient education |
| SG_UF | Patient status (IBGE code) |
| ID_MN_RESI | City of the patient (IBGE code) |
| ID_RG_RESI | Health facility code |
| CS_ZONA | Area of Residence |
| ID_PAIS | Patient Country Code (IBGE Code) |
| DT_INVEST | Start date of case investigation |
| TPAUTOCTO | Indicates whether the case is indigenous to the area of residence. |
| COUFINF | State where the patient was infected (IBGE Code) |
| COPAISINF | Country where the patient was infected (IBGE Code) |
| COMUNINF | City where the patient was infected (IBGE Code) |
| EVOLUCAO | Case evolution |
| DT_ENCERRA | Case Closing Date |
Clinical data – Symptoms.
| Attribute | Description |
|---|---|
| FEBRE | Symptom - Fever |
| MIALGIA | Symptom - Myalgia |
| CEFALEIA | Symptom - Headache |
| EXANTEMA | Symptom - Rash |
| VOMITO | Symptom - Vomiting |
| NAUSEA | Symptom - Nausea |
| DOR_COSTAS | Symptom - Back Pain |
| CONJUNTVIT | Symptom - Conjunctivitis |
| ARTRITE | Symptom - Arthritis |
| ARTRALGIA | Symptom - Arthralgia |
| PETEQUIA_N | Symptom - Petechiae |
| LACO | Symptom - Tourniquet test |
| DOR_RETRO | Symptom - Retro-orbital pain |
Clinical data – Comorbidities.
| Attribute | Description |
|---|---|
| DIABETES | Pre-existing disease - Diabetes |
| HEMATOLOG | Pre-existing disease - Hematological disease |
| HEPATOPAT | Pre-existing disease - Liver disease |
| RENAL | Pre-existing disease - Kidney disease |
| HIPERTENSA | Pre-existing disease - Hypertension |
| ACIDO_PEPT | Pre-existing disease - Peptic acid disease |
| AUTO_IMUNE | Pre-existing disease - Autoimmune disease |
Laboratory data.
| Attribute | Description |
|---|---|
| RESUL_SORO | Serological Test Results (IgM) Dengue |
| RESUL_NS1 | Test Result Serology ELISA |
| RESUL_VI_N | Test Result Viral Isolation |
| RESUL_PCR_ | RT-PCR Exam Result |
| HISTOPA_N | Histopathology Test Result |
| IMUNOH_N | Immunohistochemistry Test Result |
| HOSPITALIZ | If the patient was hospitalized |
| LEUCOPENIA | Leukopenia - Low level of white blood cells in the blood |
| CLASSI_FIN | Final patient classification |
| Measurement(s) | clinical data |
| Technology Type(s) | interview |