| Literature DB >> 28821474 |
Avi Kenny1, Nicholas Gordon1, Thomas Griffiths1, John D Kraemer2, Mark J Siedner3.
Abstract
BACKGROUND: The use of mobile devices for data collection in developing world settings is becoming increasingly common and may offer advantages in data collection quality and efficiency relative to paper-based methods. However, mobile data collection systems can hamper many standard quality assurance techniques due to the lack of a hardcopy backup of data. Consequently, mobile health data collection platforms have the potential to generate datasets that appear valid, but are susceptible to unidentified database design flaws, areas of miscomprehension by enumerators, and data recording errors.Entities:
Keywords: data accuracy; data collection; eHealth; mHealth; questionnaire design; research methodology; survey methodology; surveys
Mesh:
Year: 2017 PMID: 28821474 PMCID: PMC5581386 DOI: 10.2196/jmir.7813
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Classification of detectable errors.
| # | Class | Description | Example of error detected |
| 1 | Removal of “required” constraint | Removal of a “required question” constraint | User accidentally skips a question on postnatal care that he or she was supposed to complete |
| 2 | Illogical response combinations: multiple questions | Inclusion of 2 or more questions for which a certain combination of answers is logically impossible | The first question is “What is your gender?”; user answers “male.” The second question is “Have you ever given birth”; user answers “yes.” |
| 3 | Illogical response combinations: single question | Inclusion of an individual, multiple-response, multiple-choice question for which certain combinations of responses is logically impossible | The question is “Who checked on you during your last pregnancy?” User selects 2 options: “family members” and “I don’t know.” |
| 4 | Intentional redundancy | Repetition of the same question (possibly with slightly different wording or within a different question sequence) more than once in different sections of the questionnaire | At the start of the survey, user answers the question “How many times have you given birth?” with “6.” Later in the survey, the user answers a repeated instance of the same question (“How many times have you given birth?”) with “5.” |
| 5 | Manual skip logic | Forcing the user to select the next branch of questions to ask, based on responses to previous questions (instead of automating skip logic) | User answers the question “Have you ever been to a health clinic?” with a “No”. User is then prompted with 2 possible options and has to choose one: “Complete clinical questionnaire” or “Skip clinical questionnaire and proceed to child health questionnaire.” User selects “Complete clinical questionnaire.” |
| 6 | Removing minimum or maximum constraints | Removing constraints on the minimum or maximum value that can be entered for a question | User answers “657” to the question “How old are you, in years?” |
| 7 | Manual calculation | Prompt the user to enter a value that could be mathematically calculated from previous responses | Survey date is “June 3, 2016.” User answers the question “What is your birthday?” with “June 4, 1996.” The next question is “What is your age, in years?”; respondent answers “24.” |
| 8 | Allowing invalid data type | User is allowed to enter a value of an incorrect data type | The question is “How many times have you seen a doctor in the past month?” User answers “sometimes.” |
Specific detectable errors implemented in cluster sample survey.
| # | Class | Error definition | Number of errors | Error rate, % |
| 1 | Intentional redundancy | Gave different answers for the question (“Was your most recent birth in a health facility?”) in different sections of the questionnaire | 19/618 | 3.1 |
| 2 | Intentional redundancy | Gave different answers for the question (“Have you ever given birth?”) in different sections of the questionnaire | 10/961 | 1.0 |
| 3 | Intentional redundancy | Gave different answers for the question (“What was the date of birth of your most recently birthed child?”) in different sections of the questionnaire | 84/618 | 13.6 |
| 4 | Intentional redundancy | Gave different answers for the question (“Is your most recently birthed child still alive?”) in different sections of the questionnaire | 10/618 | 1.6 |
| 5 | Illogical response combinations: single question | Question is “Where you go to get medical advice or treatment?”; answer options included (“refused to respond” OR “unknown”) AND (“clinic” OR “drugstore” OR “community health worker” OR “traditional healer” OR “other”) | 2/895 | 0.2 |
| 6 | Illogical response combinations: single question | Question is “What are the signs of someone who can have ebola?”; answer options included (“refused to respond” OR “unknown”) AND (“fever” OR “muscle pains” OR “vomiting” OR “sore throat” OR “diarrhea” OR “bleeding” OR “other”) | 0/895 | 0.0 |
| 7 | Removal of “required” constraint | A required question (“Can people get Ebola from touching an Ebola patient?”) was skipped | 0/895 | 0.0 |
| 8 | Removal of “required” constraint | A required question (“Can people get Ebola from the air?”) was skipped | 0/895 | 0.0 |
| 9 | Removal of “required” constraint | A required question (“Can people get Ebola by touching or washing a dead body?”) was skipped | 0/895 | 0.0 |
| 10 | Illogical response combinations: multiple questions | Answers for a multiple-response question (“From whom did the child get treatment [for fever or cough]?”) were given; an answer was given to the following question (“From whom did the child get treatment FIRST?”) that was not selected in the previous list of responses | 0/325 | 0.0 |
| 11 | Illogical response combinations: multiple questions | Answers for a multiple-response question (“From whom did the child get treatment [for diarrhea]?”) were given; an answer was given to the following question (“From whom did the child get treatment FIRST?”) that was not selected in the previous list of responses | 0/202 | 0.0 |
| Total | 125/7817 | 1.60 |
Enumerator-specific error rates.
| Enumerator ID# | Error rate, % | Overall error rate, % | ||
| (day 0-14) | (day 15-29) | (day 30-45) | (day 0-45) | |
| 2 | 3.1 (14/458) | 0.4 (2/465) | 1.2 (5/415) | 1.57 (21/1338) |
| 3 | 2.9 (13/452) | 0.8 (3/382) | 1.8 (6/334) | 1.88 (22/1168) |
| 4 | 2.8 (17/605) | 1.8 (9/506) | 1.3 (5/393) | 2.06 (31/1504) |
| 5 | 1.8 (7/386) | 1.6 (6/364) | 1.0 (3/286) | 1.54 (16/1036) |
| 6 | 2.5 (14/552) | 0.9 (5/528) | 0.2 (1/436) | 1.32 (20/1516) |
| 7 | 1.4 (7/512) | 1.6 (6/380) | 0.6 (2/363) | 1.20 (15/1255) |
Change in error rates over time (primary and sensitivity analyses).
| Analysis | Type | Number of | Odds ratio (OR) or | Predicted error | Predicted error | |
| Primary (#1); all errors included | Logistic regression | 9527 | OR=0.969 | <.001 | 0.0230 | 0.0056 |
| Sensitivity (#2); excludes most common error | Logistic regression | 8566 | OR = 0.985 | .18 | 0.0064 | 0.0032 |
| Sensitivity (#3); includes only 3 most common errors | Logistic regression | 2883 | OR = 0.965 | <.001 | 0.0710 | 0.0153 |
| Sensitivity (#4); includes only 5 most common errors; aggregated data | Logistic regression | 4739 | OR = 0.968 | <.001 | 0.0461 | 0.0112 |
| Sensitivity (#5); all errors included; aggregated data | Linear regression | 218 | beta = −.000444 | <.001 | 0.0252 | 0.0052 |
| Sensitivity (#6); excludes most common error; aggregated data | Linear regression | 218 | beta = −.000051 | .33 | 0.0069 | 0.0047 |
| Sensitivity (#7); includes only 3 most common errors; aggregated data | Linear regression | 218 | beta = −.002235 | .004 | 0.1094 | 0.0088 |
| Sensitivity (#8); includes only 5 most common errors | Linear regression | 218 | beta = −.000903 | .001 | 0.0530 | 0.0124 |
Figure 1Daily enumerator-specific error rates over time, with fitted regression line (jittered for clarity).