| Literature DB >> 35948576 |
Livia Abdalla1,2, Douglas A Augusto3, Eduardo Krempser3, Marcia Chame3, Amanda S Dufek4, Leonardo Oliveira5.
Abstract
The lack of georeferencing in geospatial datasets hinders the accomplishment of scientific studies that rely on accurate data. This is particularly concerning in the field of health sciences, where georeferenced data could lead to scientific results of great relevance to society. The Brazilian health systems, especially those for Notifiable Diseases, in practice do not register georeferenced data; instead, the records indicate merely the municipality in which the event occurred. Typically in data-driven modeling, accurate disease prediction models based on occurrence requires socioenvironmental characteristics of the exact location of each event, which is often unavailable. To enrich the expressiveness of data-driven models when the municipality of the event is the best available information, we produced datasets with statistical characterization of all 5,570 Brazilian municipalities in 642 layers of thematic data that represent the natural and artificial characteristics of the municipalities' landscapes over time. This resulted in a collection of datasets comprising a total of 11,556 descriptive statistics attributes for each municipality.Entities:
Year: 2022 PMID: 35948576 PMCID: PMC9365826 DOI: 10.1038/s41597-022-01581-2
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Thematic layers comprising the dataset collection.
| Temporality | Thematic layers | # classes | # attributes | # instances |
|---|---|---|---|---|
| Annual (1981 to 2020) | Land use and cover (Mapbiomas v6.0; 1985–2020) | 25 | 450 | 200,484 |
| Temperature and precipitation (NCEP/CFSR and CHIRPS) | 19 | 198 + 144 | 445,520 | |
| Quinquennial (2000 to 2020) | Population count (SEDAC - NASA) | 1 | 18 | 27,850 |
| Population density (SEDAC - NASA) | 1 | 18 | 27,850 | |
| Atemporal | Climate normals for temperature and precipitation (Worldclim) | 67 | 846 + 360 | 11,140 |
| Altitude (SRTM - NASA) | 1 | 18 | 5,570 | |
| Geomorphology (IBGE) | 10 | 180 | 5,570 | |
| Soils (IBGE) | 65 | 1170 | 5,570 | |
| Phytophysiognomies (IBGE) | 52 | 936 | 5,570 | |
| Biome boundaries (IBGE) | 6 | 108 | 5,570 | |
| No temporal regularity | Mining areas (ANM) | 336 | 6048 | 5,570 |
| Roads (IBGE) | 1 | 18 | 27,850 | |
| Railways (IBGE) | 1 | 18 | 27,850 | |
| Waterways or watercourse (IBGE) | 2 | 36 | 27,850 | |
| Hydroelectric plants (IBGE) | 1 | 18 | 27,850 | |
| Dams (IBGE) | 1 | 18 | 27,850 | |
| Conservation unit areas (MMA) | 1 | 18 | 5,570 | |
| Indigenous lands and Quilombola territories (IBGE) | 1 | 18 | 5,570 | |
| Zone climates and regional subunits (IBGE) | 51 | 918 | 5,570 | |
| Total | 642 | 11,556 | 902,224 | |
The column number of attributes is calculated as the number of classes × 18 (number of statistics), whereas number of instances is the number of years × 5,570 (number of municipalities).
Original data format, resulting geometry, unit and scale/resolution of the thematic layers.
| Thematic layers | Original data format | Resulting geometry | Unit of measurement | Scale or spatial resolution |
|---|---|---|---|---|
| Land use and cover (Mapbiomas v6.0) | raster | polygon | m2 | 30 m |
| Temperature and precipitation (NCEP/CFSR and CHIRPS) | raster | point | Kelvin, mm | 5 km |
| Population count (SEDAC - NASA) | raster | point | quantity | 1 km |
| Population density (SEDAC - NASA) | raster | point | quantity/km2 | 1 km |
| Climate normals for temperature and precipitation (Worldclim) | raster | point | °C × 10, mm | 1 km |
| Altitude (SRTM - NASA) | raster | point | m | 30 m |
| Geomorphology (IBGE) | vector | polygon | m2 | 1:5,000,000 |
| Soils (IBGE) | vector | polygon | m2 | 1:5,000,000 |
| Phytophysiognomies (IBGE) | vector | polygon | m2 | 1:5,000,000 |
| Biome boundaries (IBGE) | vector | polygon | m2 | 1:5,000,000 |
| Mining areas (ANM) | vector | polygon | m2 | 1:1,000,000 |
| Roads (IBGE) | vector | line | m | 1:250,000 |
| Railways (IBGE) | vector | line | m | 1:250,000 |
| Waterways or watercourse (IBGE) | vector | line | m | 1:250,000 |
| Hydroelectric plants (IBGE) | vector | point | quantity | 1:250,000 |
| Dams (IBGE) | vector | line | m | 1:250,000 |
| Conservation unit areas (MMA) | vector | polygon | m2 | 1:100,000 |
| Indigenous lands and Quilombola territories (IBGE) | vector | polygon | m2 | 1:250,000 |
| Zone climates and regional subunits (IBGE) | vector | polygon | m2 | 1:5,000,000 |
Fig. 1Examples of thematic layers with annual temporality in the territorial extension of the municipality of Rio de Janeiro.
Fig. 4Climate data for total precipitation, maximum, mean and minimum temperature from Worldclim in the territorial extension of the municipality of Rio de Janeiro for the month of January.
Statistics calculated for the features/variables in the scope of the municipalities.
| Statistics | Description |
|---|---|
| count | Quantity of features/geometries for each class or variable in the thematic layers contained in each municipality |
| sum | Sum of the areas, lengths, or values of each class or variable in the thematic layers contained in each municipality |
| mean | Mean area, length, or value for each class or variable in the thematic layers contained in each municipality |
| sd | Standard deviation of the areas, lengths, or values for each class or variable in the thematic layers contained in each municipality |
| min | Minimum area, length, or value for each class or variable in the thematic layers contained in each municipality |
| max | Maximum area, length, or value for each class or variable in the thematic layers contained in each municipality |
| 25 | First quartile of the areas, lengths, or values of each class or variable in the thematic layers contained in each municipality |
| 50 | Median of the areas, lengths, or values of each class or variable in the thematic layers contained in each municipality |
| 75 | Third quartile of the areas, lengths, or values of each class or variable in the thematic layers contained in each municipality |
| This means that the statistic preceding the suffix was divided by the municipality’s area in m2 |
Values of descriptive statistics calculated in PostgreSQL/PostGIS for the Urban Infrastructure class in the municipality of Rio de Janeiro (areas in m2).
| geocode | datetime | Urban_infrastructure-count | Urban_infrastructure-sum | Urban_infrastructure-mean | Urban_infrastructure-sd | Urban_infrastructure-min | Urban_infrastructure-max |
|---|---|---|---|---|---|---|---|
| 330455 | 2020 | 344 | 667214841.91 | 1939578.03 | 32714160.03 | 358.29 | 607070012.49 |
The count statistic refers to the number of urban areas.
Values of descriptive statistics calculated in PostGIS/PostgreSQL for the Altitude variable in the municipality of Rio de Janeiro (in meters).
| geocode | datetime | Altitude-count | Altitude-sum | Altitude-mean | Altitude-sd | Altitude-min | Altitude-max |
|---|---|---|---|---|---|---|---|
| 330455 | 2000 | 1369182 | 119077300 | 86.97 | 147.50 | −28 | 1014 |
The count statistic refers to the number of altitude data points.
List of available years for each thematic layer.
| Thematic layers | Available years |
|---|---|
| Land use and cover (Mapbiomas v6.0) | 1985 to 2020 |
| Temperature and precipitation (NCEP/CFSR and CHIRPS) | 1981 to 2020 |
| Population count (SEDAC - NASA) | 2000, 2005, 2010, 2015, 2020 |
| Population density (SEDAC - NASA) | |
| Climate normals for temperature and precipitation (Worldclim) | Climate normals from 1950 to 2000 |
| Altitude (SRTM - NASA) | 2000 |
| Geomorphology (IBGE) | 2006 |
| Soils (IBGE) | 2006 |
| Phytophysiognomies (IBGE) | 2004 |
| Biome boundaries (IBGE) | 2006 |
| Mining areas (ANM) | 2021 |
| Roads (IBGE) | 2013, 2015, 2017, 2019, 2021 |
| Railways (IBGE) | |
| Waterways or watercourses (IBGE) | |
| Hydroelectric plants (IBGE) | |
| Dams (IBGE) | |
| Conservation unit areas (MMA) | 2020 |
| Indigenous lands and Quilombola territories (IBGE) | 2019 |
| Zone climates and regional subunits (IBGE) | 2002 |
Fig. 5Values of descriptive statistics calculated in QGIS for the Urban Infrastructure class in the municipality of Rio de Janeiro (areas in m2). The count statistic refers to the number of urban areas.
Fig. 7Values of descriptive statistics calculated in QGIS for the Altitude variable in the municipality of Rio de Janeiro (in meters). The count statistic refers to the number of altitude data points. The points colored in brown have lower altitudes, in white intermediate altitudes, and in green the highest altitudes.
Illustrative example of an assembled training dataset.
| Mean altitude (m) | occurrence |
|---|---|
| 415.3 | yes |
| 560.7 | yes |
| 124.0 | no |
| Measurement(s) | Socioenvironmental descriptive statistics |
| Technology Type(s) | PostGIS • QGIS |
| Sample Characteristic - Environment | All biomes |
| Sample Characteristic - Location | Brazil |
Values of descriptive statistics calculated in PostGIS/PostgreSQL for the Roads class in the municipality of Rio de Janeiro (lengths in meters).
| geocode | datetime | Roads-count | Roads-sum | Roads-mean | Roads-sd | Roads-min | Roads-max |
|---|---|---|---|---|---|---|---|
| 330455 | 2013 | 118 | 333061.81 | 2822.56 | 2876.17 | 16.65 | 12426.91 |
The count statistic refers to the number of roads.