| Literature DB >> 25358866 |
David M Baker1, Alain-Jacques Valleron.
Abstract
BACKGROUND: Examining whether disease cases are clustered in space is an important part of epidemiological research. Another important part of spatial epidemiology is testing whether patients suffering from a disease are more, or less, exposed to environmental factors of interest than adequately defined controls. Both approaches involve determining the number of cases and controls (or population at risk) in specific zones. For cluster searches, this often must be done for millions of different zones. Doing this by calculating distances can lead to very lengthy computations. In this work we discuss the computational advantages of geographical grid-based methods, and introduce an open source software (FGBASE) which we have created for this purpose.Entities:
Mesh:
Year: 2014 PMID: 25358866 PMCID: PMC4233060 DOI: 10.1186/1476-072X-13-46
Source DB: PubMed Journal: Int J Health Geogr ISSN: 1476-072X Impact factor: 3.918
Figure 1Comparison of the use of point locations with distance computations versus the use of the grid-based approach when searching for clusters. (Above): Setting where cases and controls data consists of points (identified by latitude and longitude). Determining the number of cases and controls in a (circular) region requires computing the distances from the center of the region to each case and each control. For cluster searches, millions of regions (the potential clusters) are considered which means billions of distance calculations for a medium size (case/control) study. (Below): Setting where cases and controls have been projected to a grid. Determining the number of cases and controls in a (rectangular) region requires only reading values from memory-mapped arrays and summing them. This method is very fast even for cluster searches where millions of regions are considered (the potential clusters).
Figure 2Interactive map of the software FGBASE. The interactive map (centered here on the city of Le Havre, France) displays the cells of the grid. These square cells are color coded according to user defined criteria. Here the following color codes are used: light grey if the square is uninhabited, dark grey if the square is inhabited, purple if the square contains cases of the disease, yellow if the square contains 2 or more cases and the population to case ratio of the square is below 1500. The map has a yellow selection rectangle, which the user can move and resize. The panel below the map displays statistics calculated for the region enclosed in the selection rectangle. The user can load a list of cities with their geographical cordinates, for easy access through the left panel.
Figure 3Loading environmental factors into FGBASE. Loading vineyards from the CLC land use database into the FGBASE software. Here the map is centered at Bordeaux, France. Vineyards account for a substantial portion of pesticide use. Analysis performed with the software does not show a link between this environmental factor and type 1 diabetes.
Figure 4Loading into FGBASE a data set of spatial entities grouped into classes: the IREP factories grouped by chemical emissions. The spatial entities (IREP factories) are loaded in the FGBASE software and displayed on the interactive map. These entities are grouped into classes based on the chemicals which they have emitted (an entity can be assigned to several classes). The hypothesis driven algorithm tests each class of entities, one after the other, by reading from the grid the number of cases and controls at several different levels of proximity from entities in that class.
Candidate clusters of T1D cases discovered by the FGBASE using cases from the Isis-Diab cohort
| Cluster candidate | Cluster candidate 1 | Cluster candidate 2 | Cluster candidate 3 | Cluster candidate 4 |
|---|---|---|---|---|
| Location | Next to Le Havre | Next to Dunkerque | Next to Toulouse | Next to Nantes |
| LAEA -ETRS89 grid coordinates of the rectangular zone: | [(3610,2979), (3610,2981), (3608,2981), (3608,2979)] | [(3774,3126), (3774,3130), (3783,3130), (3783,3126)] | [(3627,2313), (3627,2315), (3631,2315), (3631,2313)] | [(3446,2746), (3446,2750), (3453,2750), (3453,2746)] |
| WGS84 coordinates of rectangular zone: | [(0.15368,49.4943), (0.150047,49.5121), (0.122639,49.5097), (0.126282,49.4919)] | [(2.19376,50.9823), (2.18771,51.018), (2.31538,51.0265), (2.32132,50.9908)] | [(1.40999,43.5677), (1.40733,43.5856), (1.45649,43.5897), (1.45914,43.5718)] | [(1.59501,47.2094), (-1.60299,47.2449), (-1.51148,47.2548), (-1.5035,47.2192)] |
| Number of cases | 8 cases | 13 cases | 13 cases | 41 cases |
| Number of controls | 3 controls | 0 controls | 4 controls | 16 controls |
| log of the numerator of Kulldorff’s likelihood ratio: | -6240.36 | -6235.35 | -6238.47 | -6235.09 |
Using the data-driven option of the FGBASE software, zones most likely to be clusters of T1D cases where discovered in the vicinity of the cities of Le Havre, Dunkerque, Toulouse and Nantes.