| Literature DB >> 32934993 |
Deus Thindwa1,2, Yama G Farooq3, Mila Shakya4, Nirod Saha5, Susan Tonks3, Yaw Anokwa6, Melita A Gordon2,7, Carl Hartung6, James E Meiring3, Andrew J Pollard3, Robert S Heyderman2,8.
Abstract
Electronic data capture systems (EDCs) have the potential to achieve efficiency and quality in collection of multisite data. We quantify the volume, time, accuracy and costs of an EDC using large-scale census data from the STRATAA consortium, a comprehensive programme assessing population dynamics and epidemiology of typhoid fever in Malawi, Nepal and Bangladesh to inform vaccine and public health interventions. A census form was developed through a structured iterative process and implemented using Open Data Kit Collect running on Android-based tablets. Data were uploaded to Open Data Kit Aggregate, then auto-synced to MySQL-defined database nightly. Data were backed-up daily from three sites centrally, and auto-reported weekly. Pre-census materials' costs were estimated. Demographics of 308,348 individuals from 80,851 households were recorded within an average of 14.7 weeks range (13-16) using 65 fieldworkers. Overall, 21.7 errors (95% confidence interval: 21.4, 22.0) per 10,000 data points were found: 13.0 (95% confidence interval: 12.6, 13.5) and 24.5 (95% confidence interval: 24.1, 24.9) errors on numeric and text fields respectively. These values meet standard quality threshold of 50 errors per 10,000 data points. The EDC's total variable cost was estimated at US$13,791.82 per site. In conclusion, the EDC is robust, allowing for timely and high-volume accurate data collection, and could be adopted in similar epidemiological settings. Copyright:Entities:
Keywords: Africa; Asia; Electronic data capture; Open Data Kit; Typhoid fever.
Year: 2020 PMID: 32934993 PMCID: PMC7471626 DOI: 10.12688/wellcomeopenres.15811.1
Source DB: PubMed Journal: Wellcome Open Res ISSN: 2398-502X
Figure 1. Electronic census report form flowchart.
Figure 2. Electronic data capture system for a multisite study.
MySQL-defined databases b_strataa, k_strataa, and d_strataa have homogeneous structures (*) e.g. table columns, data types, triggers or views. Data from MySQL-defined database table are exported back to Android-based tablet enabling data preloading for subsequent sub-studies (P). Homogeneous databases across sites merge enabling multisite data analyses (H).
Census Data Collection Time, Volume and Accuracy in Three Typhoid Endemic Sites, 2016.
| Study site | Time period of data
| Total
| Total
| Number
| Number of
| Errors per
| 95% CI
[ |
|---|---|---|---|---|---|---|---|
| All sites | |||||||
| Overall | 14.7 weeks (13–16) | 80,851 | 308,348 | 17,707 | 8,173,179 | 21.7 | 21.4, 22.0 |
| Numeric | 14.7 weeks (13–16) | 80,851 | 308,348 | 3,868 | 2,966,946 | 13.0 | 12.6, 13.5 |
| Text | 14.7 weeks (13–16) | 80,851 | 308,348 | 12,740 | 5,206,233 | 24.5 | 24.1, 24.9 |
| Malawi
[ | |||||||
| Overall | Jul 2016 – Oct 2016 | 22,364 | 97, 410 | 3,991 | 2,515,254 | 15.9 | 15.4, 16.4 |
| Numeric
[ | Jul 2016 – Oct 2016 | 22,364 | 97, 410 | 900 | 905,510 | 9.9 | 9.3, 10.6 |
| Text
[ | Jul 2016 – Oct 2016 | 22,364 | 97, 410 | 2,291 | 1,609,744 | 14.2 | 13.7, 14.8 |
| Nepal
[ | |||||||
| Overall | May 2016 – Sep 2016 | 32, 368 | 100, 207 | 9,522 | 2,784,075 | 34.2 | 33.5, 34.9 |
| Numeric
[ | May 2016 – Sep 2016 | 32, 368 | 100, 207 | 2,171 | 1,025,129 | 21.2 | 20.3, 22.1 |
| Text
[ | May 2016 – Sep 2016 | 32, 368 | 100, 207 | 7,131 | 1,758,946 | 40.5 | 39.6, 41.5 |
| Bangladesh
[ | |||||||
| Overall | Jun 2016 – Aug 2016 | 26,119 | 110,731 | 4,194 | 2,873,850 | 14.6 | 14.2, 15.0 |
| Numeric
[ | Jun 2016 – Aug 2016 | 26,119 | 110,731 | 797 | 1,036,307 | 7.7 | 7.2, 8.23 |
| Text
[ | Jun 2016 – Aug 2016 | 26,119 | 110,731 | 3,318 | 1,837,543 | 18.1 | 17.5, 18.7 |
* Persistent error sources included duplication of household identifiers (barcodes); duplication of entire individual demographics; incorrect barcode decoding during scan; illogical ages or date of births of children relative to parents; incorrect household visit dates relative to tablet system date; misspellings of traditional authority names/ward numbers, physical addresses, respondent names, household members’ names; missing GPS points; inaccurate GPS points relative to the household; and mismatches between community names and GPS points. Duplicates resulted in 800 records being deleted in Malawi, 220 in Nepal, and 79 in Bangladesh.
Ŧ Includes numeric integer, numeric decimal and alphanumeric (barcode) data types.
Ɨ Includes text, character, and date data types.
§ Number of census field workers for Malawi (20), Nepal (25), and Bangladesh (20).
** CI: Confidence Interval estimated by binomial (Clopper-Pearson) 'exact' method based on the error distribution.
Figure 3. Data entry errors before and after retraining of fieldworkers, 2016.
Time and costs attainment prior to implementation of an electronic data capture system in three typhoid endemic sites, 2016.
| Material or activity
[ | Time to attain
| Number of
| Unit cost
| Variable
| |
|---|---|---|---|---|---|
| Category | Days | Unit | X 1 – X 2 | Y | X 1 . Y |
| Tablets (including screen
| 7–60 | Tablet | 27 – 42 | 200.26 | 5,407.02 |
| Desktop server computers
[ | 0–30 | Computer | 1 – 4 | 1,523.21 | 1,523.21 |
| Network devices
[ | 0–30 | Router | 1 – 4 | 183.82 | 183.82 |
| Barcodes | 7–21 | Sheet | 1,500 – 2,530 | 0.48 | 720.00 |
| Electronic census report form
| 60 | eCRF | 1 – 3 | 3,000.00 | 3,000.00 |
| Training field workers | 2–5 | Field
| 27 – 37 | 56.82 | 1,479.60 |
| Replacement of malfunctioned
| 7–30 | Tablet | 1 – 3 | 200.26 | 200.26 |
| Backpacks | 7–60 | Backpack | 27 – 42 | 47.33 | 1,277.91 |
* Average unit cost estimated in 2016 across all study sites.
Ŧ Only 1 uniform eCRF was developed for 3 sites, for purposes of calculations, we divide the total cost by 3.
§ Some tablets already existed in other sites. Similarly, network devices and computer servers pre-existed in Malawi, Bangladesh, and a central coordinating site (Oxford Vaccine Group) but not in Nepal.
** Excludes costs of electric power to servers, charging tablets and data synchronization because of uncertainty.
US$ United States dollar currency.
Figure 4. Speed and accuracy trade-off before and after retraining of fieldworkers, 2016.