Veronica Escamilla1, Michael Emch2, Leonard Dandalo3, William C Miller4, Francis Martinson1, Irving Hoffman1. 1. Institute for Global Health and Infectious Diseases, University of North Carolina-Chapel Hill, 130 Mason Farm Road, Chapel Hill, NC 27599, United States of America (USA). 2. Department of Geography, University of North Carolina-Chapel Hill, Chapel Hill, USA . 3. University of North Carolina Project, Lilongwe, Malawi . 4. Department of Medicine, University of North Carolina-Chapel Hill, Chapel Hill, USA .
Abstract
PROBLEM: Traditional random sampling at community level requires a list of every individual household that can be randomly selected in the study community. The longitudinal demographic surveillance systems often used as sampling frames are difficult to create in many resource-poor settings. APPROACH: We used Google Earth imagery and geographical analysis software to develop a sampling frame. Every household structure within the catchment area was digitized and assigned coordinates. A random sample was then generated from the list of households. LOCAL SETTING: The sampling took place in Lilongwe, Malawi and formed a part of an investigation of the intensity of Plasmodium falciparum transmission in a multi-site Phase III trial of a candidate malaria vaccine. RELEVANT CHANGES: Creation of a complete list of household coordinates within the catchment area allowed us to generate a random sample representative of the population. Once the coordinates of the households in that sample had been entered into the hand-held receivers of a global positioning system device, the households could be accurately identified on the ground and approached. LESSONS LEARNT: In the development of a geographical sampling frame, the use of Google Earth satellite imagery and geographical software appeared to be an efficient alternative to the use of a demographic surveillance system. The use of a complete list of household coordinates reduced the time needed to locate households in the random sample. Our approach to generate a sampling frame is accurate, has utility beyond morbidity studies and appears to be a cost-effective option in resource-poor settings.
PROBLEM: Traditional random sampling at community level requires a list of every individual household that can be randomly selected in the study community. The longitudinal demographic surveillance systems often used as sampling frames are difficult to create in many resource-poor settings. APPROACH: We used Google Earth imagery and geographical analysis software to develop a sampling frame. Every household structure within the catchment area was digitized and assigned coordinates. A random sample was then generated from the list of households. LOCAL SETTING: The sampling took place in Lilongwe, Malawi and formed a part of an investigation of the intensity of Plasmodium falciparum transmission in a multi-site Phase III trial of a candidate malaria vaccine. RELEVANT CHANGES: Creation of a complete list of household coordinates within the catchment area allowed us to generate a random sample representative of the population. Once the coordinates of the households in that sample had been entered into the hand-held receivers of a global positioning system device, the households could be accurately identified on the ground and approached. LESSONS LEARNT: In the development of a geographical sampling frame, the use of Google Earth satellite imagery and geographical software appeared to be an efficient alternative to the use of a demographic surveillance system. The use of a complete list of household coordinates reduced the time needed to locate households in the random sample. Our approach to generate a sampling frame is accurate, has utility beyond morbidity studies and appears to be a cost-effective option in resource-poor settings.
In estimating disease prevalence at community level, a sampling frame is required to ensure that the study sample is representative of the study community. Such a sampling frame typically lists all households that can be randomly selected in the study community and is often based on an existing demographic surveillance system. In the absence of such a system, one option is to create a complete listing of all households in the study area. In many malaria indicator surveys, for example, each household in the study area is visited, information on each head of household is captured and a sketch map showing the location of each household structure is drawn. However, this approach carries substantial field costs and these costs may increase when large census enumeration areas have to be sampled and a single segment has to be randomly selected from each enumeration area. Many resource-poor settings lack the funds and labour needed to create a full household listing in this manner – or to create and maintain a longitudinal demographic surveillance system.Open-source software packages for geographical analysis are being increasingly employed in public health research, are available to those working in resource-poor settings and can be used to create an affordable sampling frame. We therefore used Google Earth and other geographical analysis software to generate an inexpensive sampling frame for a study on the intensity of Plasmodium falciparum transmission in Lilongwe, Malawi. In this paper, we report our experiences in constructing this sampling frame and using it to generate a random sample of households at the community level.
Local setting
Malaria transmission intensity – which is dependent on several factors, including the parasite, vector, human host, environment and treatment – serves as an important guide for interventions of malaria control. The effects of such interventions on malaria morbidity may vary with transmission intensity. Thus, there is a need to assess not only how decreasing parasite exposure – via vaccines and other interventions – will alter malaria transmission but also how varying levels of malaria intensity may alter the efficacy of any intervention.Lilongwe, Malawi, is one of 11 sites of a Phase III efficacy trial for the candidate malaria vaccine, RTS,S/AS01 (GlaxoSmithKline, Brentford, England). As part of this trial, temporal trends in the prevalence of human infection with P. falciparum in the vaccine catchment area needed to be assessed. The Lilongwe site is not covered by a demographic surveillance system and does not have the resources needed to conduct a complete field-based listing of all of the households in the catchment area. We therefore needed an alternative, inexpensive and low-resource approach to create a valid sampling frame for the investigation of malaria transmission intensity.
Approach
We used two open-source software packages – Google Earth 5.1 (Google, Mountain View, United States of America) and Digipoint 2 (Zonum Solutions, Tucson, USA) – to generate a listing that showed the geographical location of each household. We then used two more programs – ArcGIS 9.3 (Environmental Systems Research Institute, Redlands, USA) and Hawth’s Tools for ArcGIS – to select a random sample from the complete geographical listing. Hand-held Garmin eTrex 10 global positioning system receivers (Garmin Ltd, Schaffhausen, Switzerland) were subsequently employed to find the sample households in the field.We first identified the 61 census enumeration areas that defined the catchment area for the Lilongwe vaccine trial. For this, we used the enumeration areas of residence for the malariapatients attending the Malawi Ministry of Health’s area 18 health centre. The boundary files for the enumeration areas used in the 2008 census were obtained from the Malawi National Statistics Office. Community health workers used Garmin eTrex 10 receivers to collect the coordinates of each village within the catchment area. These coordinates and the boundary files were then imported into ArcGIS 9.3, converted into keyhole markup language (KML) files – for compatibility with Google Earth – and then imported into Google Earth so that the boundary of the catchment area could be defined. Village locations were overlaid onto the boundaries of enumeration areas and enumeration areas that included household structures within villages were digitized.After the catchment area was defined in Google Earth, a list of the coordinates of every household structure in the area was generated. Individual household structures on the Google Earth satellite images of the catchment area were digitized using Digipoint 2 rather than Google Earth because this simplified the saving of multiple digitized points. The satellite images that we used had been recently updated – on 11 December 2009 or 4 June 2010 – and had sufficient resolution to identify individual structures (Fig. 1). However, they could not be used to distinguish houses from local commercial structures that were similar to the local houses in terms of size and construction material.
Fig. 1
Zoomed Google Earth image of the sampling frame, Lilongwe, Malawi, 2010
Zoomed Google Earth image of the sampling frame, Lilongwe, Malawi, 2010Note: The random sample of households to be surveyed are indicated by black dots. The complete list of households is indicated by green dots. The full length of the scale bar – which formed part of the image downloaded from Google Earth – represents a distance of 288 m.Map data: Google, digital globe.All of the digitized points – approximately 18 000 – were converted to Universal Transverse Mercator coordinates in Digipoint 2 and then imported into ArcGIS 9.3. We then used the random selection tool from Hawth’s Tools for ArcGIS to generate a simple random sample of 880 household structures. The size of the sample required had already been defined by the manufacturer of the candidate vaccine. The coordinates of the structures in the random sample were entered into Garmin eTrex 10 receivers so that health workers given the receivers could find and visit the structures. Households were visited in the order of their random selection. In instances where two houses were very closely situated, the house closest to the relevant coordinates was selected. Structures that were included in the initial sample but found to be latrines, kitchens or commercial structures when visited were noted but otherwise ignored, and not replaced by new resampled coordinates. Houses found to be unoccupied or occupied only by children when first visited were revisited at a later date. If an adult was present in a visited household, demographic information was collected and the household members were invited to participate in our malaria survey at a later date.
Evaluation and lessons learnt
By using Google Earth satellite imagery and geographical methods we were able to create a geographical sampling frame and select a representative random sample of the target population for our study. We eventually evaluated P. falciparum transmission during peak season – February to June – in 2011, 2012 and 2013. For each annual survey, we resampled the complete listing of household coordinates. Households that were investigated more than once were easily identified using the spatial join tool within ArcGIS 9.3.Our method of generating a geographical household list appeared efficient and cost-effective and required relatively few resources other than the Garmin eTrex 10 receivers and an ArcGIS software licence. The digitizing exercise was completed by one research assistant within 3 months.The geographical listing of households was found to be fairly accurate when used by community health workers on the ground. Fewer than 5.0% of the structures included in our initial list of structures to be visited were found to be latrines, kitchens or commercial structures. The ratio of male to female subjects in our sample – 1.08 – was very similar to that recorded in the Malawi demographic and health survey in 2010 –1.07.The use of Google Earth imagery and geographical methods to assemble a household list appears to be an efficient use of time in the field. Households that were targeted for study could be rapidly located on the ground by a community health worker with a global positioning system receiver. The list could be easily updated and could provide a useful framework for the future development of a demographic surveillance system. All survey data could be linked to household coordinates and displayed in ArcGIS – allowing spatial patterns in any household characteristic to be illustrated and investigated. Using our method of generating a sampling frame, all households in a target area can be digitized relatively easily, regardless of the area’s size. There is no need to segment large study areas before sampling. In the future, it may be possible to distinguish households from industrial buildings, even in densely populated areas, by using high-resolution satellite or aerial imagery,, although identification of the use of multi-storey buildings from such imagery is likely to remain a challenge.Our method of generating a sampling frame could be used for many health investigations. Google Earth has already been employed to identify and sample household clusters,, capture neighbourhood characteristics that influence health outcomes and improve disease surveillance.,, Several different geographical methods – and combinations of such methods – can be used to generate a geographical sampling frame. For example, if the simultaneous export of a large number of points is not a concern, household structures can be digitized directly in Google Earth. Alternative software is available if researchers cannot obtain an ArcGIS license. For example, QGIS is open-source and includes a random selection tool that can be used to create a representative sample of households. Our approach – or a variation of it – can provide an accurate and apparently cost-effective sampling frame that has multiple potential applications in resource-poor settings (Box 1).In the development of a geographical sampling frame, the use of Google Earth imagery and geographical software was an efficient alternative to the use of a demographic surveillance system.The use of a complete list of household coordinates reduced the time needed to locate households in the random sample.The approach that we used to generate a sampling frame appears accurate and cost-effective and has utility beyond morbidity studies, even in resource-poor settings.
Authors: Philippa Clarke; Jennifer Ailshire; Robert Melendez; Michael Bader; Jeffrey Morenoff Journal: Health Place Date: 2010-08-11 Impact factor: 4.078
Authors: Lp Galway; Nathaniel Bell; Al Shatari Sae; Amy Hagopian; Gilbert Burnham; Abraham Flaxman; Wiliam M Weiss; Julie Rajaratnam; Tim K Takaro Journal: Int J Health Geogr Date: 2012-04-27 Impact factor: 3.918
Authors: Aileen Y Chang; Maria E Parrales; Javier Jimenez; Magdalena E Sobieszczyk; Scott M Hammer; David J Copenhaver; Rajan P Kulkarni Journal: Int J Health Geogr Date: 2009-07-23 Impact factor: 3.918
Authors: Timothy Shields; Jessie Pinchoff; Jailos Lubinda; Harry Hamapumbu; Kelly Searle; Tamaki Kobayashi; Philip E Thuma; William J Moss; Frank C Curriero Journal: Geospat Health Date: 2016-05-31 Impact factor: 1.212
Authors: Aniset Kamanga; Silvia Renn; Derek Pollard; Daniel J Bridges; Brian Chirwa; Jessie Pinchoff; David A Larsen; Anna M Winters Journal: Malar J Date: 2015-09-17 Impact factor: 2.979
Authors: Daniel J Bridges; John M Miller; Victor Chalwe; Hawela Moonga; Busiku Hamainza; Rick Steketee; Kafula Silumbe; Jenala Nyangu; David A Larsen Journal: Trials Date: 2017-11-02 Impact factor: 2.279
Authors: Veronica Escamilla; Alisa Alker; Leonard Dandalo; Jonathan J Juliano; William C Miller; Portia Kamthuza; Tapiwa Tembo; Gerald Tegha; Francis Martinson; Michael Emch; Irving F Hoffman Journal: Malar J Date: 2017-04-07 Impact factor: 2.979
Authors: Dana R Thomson; Forrest R Stevens; Nick W Ruktanonchai; Andrew J Tatem; Marcia C Castro Journal: Int J Health Geogr Date: 2017-07-19 Impact factor: 3.918
Authors: Daniel J Bridges; Derek Pollard; Anna M Winters; Benjamin Winters; Chadwick Sikaala; Silvia Renn; David A Larsen Journal: Malar J Date: 2018-02-23 Impact factor: 2.979