| Literature DB >> 33187520 |
Milka Bochere Gesicho1,2, Martin Chieng Were3,4, Ankica Babic5,6.
Abstract
BACKGROUND: The District Health Information Software-2 (DHIS2) is widely used by countries for national-level aggregate reporting of health-data. To best leverage DHIS2 data for decision-making, countries need to ensure that data within their systems are of the highest quality. Comprehensive, systematic, and transparent data cleaning approaches form a core component of preparing DHIS2 data for analyses. Unfortunately, there is paucity of exhaustive and systematic descriptions of data cleaning processes employed on DHIS2-based data. The aim of this study was to report on methods and results of a systematic and replicable data cleaning approach applied on HIV-data gathered within DHIS2 from 2011 to 2018 in Kenya, for secondary analyses.Entities:
Keywords: Data management; Data-cleaning; HIV-indicators; dhis2
Year: 2020 PMID: 33187520 PMCID: PMC7664027 DOI: 10.1186/s12911-020-01315-7
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Creation of the evaluation data set
Fig. 2Repeated cycles of data cleaning
Categorization of the various situations within DHIS2 and actions taken
| Situation | CPCa | RRb | RRTc | Diagnosis | Action |
|---|---|---|---|---|---|
| A | 0 | 0 | 0 | Nothing was reported by facilities during this period, signifying that the facility does not report to DHIS2. This could be a true normal | Facility records excluded |
| B | 0 | X | X | Submitted reports might be on time, but are empty. Can result from programs wanting to have full MOH731 submission even though they do not offer services in all the 6 programmatic areas—hence submitting empty reports from non-required programmatic areas (Report is useless to decision-maker as it is empty) | Facility records excluded |
| C | 0 | X | 0 | Submitted reports are empty and not on time (Report is useless to decision-maker as it is empty and not on time) | Facility records excluded |
| D | X | 0 | 0 | No values present for RR and RRT. However, the reports are not empty | Facility records excluded |
| E | X | > 100% | X | Erroneous records as percentage RR cannot go beyond 100 as this is not logically possible | Facility records excluded |
| F | X | > 100% | > 100% | Erroneous records percentage RR and RRT cannot go beyond 100 as this is not logically possible | Facility records excluded |
| G | X | X | X | Reports submitted on time with relevant indicators included. Ideal situation | Facility records included |
| H | X | X | 0 | Submitted reports with data elements in them, but not submitted in a timely manner | Facility records included |
aCPC cumulative percent completion, bRR reporting rate, cRRT reporting rate on time
Example of sectional illustration of first data set containing facility records
| Year | Organisation unit | CPC-HCT | RR-HCT | RRT-HCT | CPC-BS | RR-BS | RRT-BS | ** | Avg-CPC | Avg-RR | Avg-RRT |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2016 | Facility A | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2016 | Facility B | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2017 | Facility C | 10 | 90 | 80 | 100 | 90 | 80 | 0 | 50 | 60 | 50 |
CPC cumulative percentage completion, RR-HCT reporting rate HIV counselling and testing, RRT reporting rate on time, BS blood safety, Avg average, ** remaining four reports with the same variable sequence
Fig. 3Data cleaning process
Proportion of facility records (2011–2018) by programmatic area in the various situations based on facility records in dataset 4 (n = 42,007)
| Situation | Facility records by programmatic area | |||||
|---|---|---|---|---|---|---|
| HCT (%) | PMTC (%) | CrT (%) | VMMC (%) | PEP (%) | BS (%) | |
| B(0XX) | 2.68 | 6.15 | 1.32 | 2.81 | 18.04 | 1.70 |
| C(0X0) | 0.75 | 0.75 | 0.32 | 1.13 | 0.76 | 0.19 |
| D(X00) | 0.66 | 1.97 | 1.66 | 0.78 | 0.71 | 0.09 |
| G(XXX) | 92.44 | 81.52 | 42.60 | 0.63 | 21.82 | 0.45 |
| H(XX0) | 1.57 | 2.13 | 1.20 | 0.03 | 0.28 | 0.01 |
| Duplicates | 0.02 | 0.00 | 0.01 | 0.00 | 0.00 | 0.00 |
| Total facility records (based on data set 4) | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
| Total facility records removed | 6.02 | 16.35 | 56.21 | 99.34 | 77.90 | 99.54 |
| Total facility records retained | 93.98 | 83.65 | 43.79 | 0.66 | 22.10 | 0.46 |
Situation-Detailed explanation of the various reporting situations within DHIS2 can be found in Table 1
Fig. 4Distribution of facility records based on situation B (empty reports) and situation D against programmatic area
Results for Wilcoxon signed rank test for distribution of records in situation B
| Situation B -Empty reports (0XX) | |||
|---|---|---|---|
| Pairwise comparison by programmatic area | Wilcoxon signed ranks test ( | Wilcoxon signed ranks test (Z value) | Distribution of records in situation B based on pairwise comparison by programmatic area |
| PMTCT—HCT | 0.012 | − 2.521 | Higher in PMTCT for 8 years |
| CrT—HCT | 0.036 | − 2.100 | Lower in CrT for 6 years |
| PEP—HCT | 0.012 | − 2.521 | Higher in PEP for 8 years |
| BS—HCT | 0.012 | − 2.524 | Lower in BS for 8 years |
| CrT—PMTCT | 0.017 | − 2.521 | Lower in CrT for 7 years |
| VMMC—PMTCT | 0.012 | − 2.521 | Lower in VMMC for 8 years |
| PEP—PMTCT | 0.012 | − 2.521 | Higher in PEP for 8 years |
| BS—PMTCT | 0.012 | − 2.524 | Lower in BS for 8 years |
| VMMC—CrT | 0.050 | − 1.960 | Higher in VMMC for 6 years |
| PEP—CrT | 0.012 | − 2.521 | Higher in PEP for 8 years |
| PEP—VMMC | 0.012 | − 2.521 | Higher in PEP for 8 years |
| BS—VMMC | 0.012 | − 2.524 | Lower in BS for 8 years |
| BS—PEP | 0.012 | − 2.521 | Lower in BS for 8 Years |
PMTCT prevention of mother to child transmission, HCT HIV counselling and testing, PEP post-exposure prophylaxis, BS blood saftey, CrT care and treatment, VMMC voluntary medical male circumcision
Results for Wilcoxon signed rank test for distribution of facility records in situation D (X00)
| Situation D (X00) | |||
|---|---|---|---|
| Pairwise comparison by programmatic area | Wilcoxon signed ranks test ( | Wilcoxon signed ranks test (Z value) | Distribution of records in situation D based on pairwise comparison by programmatic area |
| PMTCT—HCT | 0.012 | − 2.521 | Higher in PMTCT for 8 years |
| CrT—HCT | 0.012 | − 2.521 | Higher in CrT for 8 years |
| BS—HCT | 0.012 | − 2.524 | Lower in BS for 8 years |
| VMMC—PMTCT | 0.012 | − 2.521 | Lower in VMMC for 8 years |
| PEP—PMTCT | 0.012 | − 2.521 | Lower in PEP for 8 years |
| BS—PMTCT | 0.012 | − 2.521 | Lower in BS for 8 years |
| VMMC—CrT | 0.012 | − 2.524 | Lower in VMMC for 8 years |
| PEP—CrT | 0.012 | − 2.527 | Lower in PEP for 8 years |
| BS—CrT | 0.012 | − 2.524 | Lower in BS for 8 years |
| BS—VMMC | 0.018 | − 2.375 | Lower in BS for 8 years |
| BS—PEP | 0.012 | − 2.524 | Lower in BS for 8 years |
PMTCT prevention of mother to child transmission, HCT HIV counselling and testing, CrT care and treatment, PEP post-exposure prophylaxis, BS blood safety, VMMC Voluntary Medical Male Circumcision