Literature DB >> 34095377

H2020 project CAPTOR dataset: Raw data collected by low-cost MOX ozone sensors in a real air pollution monitoring network.

Jose M Barcelo-Ordinas1, Pau Ferrer-Cid1, Jorge Garcia-Vidal1, Mar Viana2, Ana Ripoll2,3.   

Abstract

The H2020 CAPTOR project deployed three testbeds in Spain, Italy and Austria with low-cost sensors for the measurement of tropospheric ozone (O3). The aim of the H2020 CAPTOR project was to raise public awareness in a project focused on citizen science. Each testbed was supported by an NGO in charge of deciding how to raise citizen awareness according to the needs of each country. The data presented in this document correspond to the raw data captured by the sensor nodes in the Spanish testbed using SGX Sensortech MICS 2614 metal-oxide sensors. The Spanish testbed consisted of the deployment of twenty-five nodes. Each sensor node included four SGX Sensortech MICS 2614 ozone sensors, one temperature sensor and one relative humidity sensor. Each node underwent a calibration process by co-locating the node at an EU reference air quality monitoring station, followed by a deployment in a sub-urban or rural area in Catalonia, Spain. All nodes spent two to three weeks co-located at a reference station in Barcelona, Spain (urban area), followed by two to three weeks co-located at three sub-urban reference stations near the final deployment site. The nodes were then deployed in volunteers' homes for about two months and, finally, the nodes were co-located again at the sub-urban reference stations for two weeks for final calibration and assessment of potential drifts. All data presented in this paper are raw data taken by the sensors that can be used for scientific purposes such as calibration studies using machine learning algorithms, or once the concentration values of the nodes are obtained, they can be used to create tropospheric ozone pollution maps with heterogeneous data sources (reference stations and low-cost sensors).
© 2021 The Author(s). Published by Elsevier Inc.

Entities:  

Keywords:  Calibration of sensors; Low-cost sensors; Machine learning algorithms; Metal-oxide sensors; Pollution maps; Tropospheric ozone

Year:  2021        PMID: 34095377      PMCID: PMC8166746          DOI: 10.1016/j.dib.2021.107127

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table

Value of the Data

The sensor data complement the data provided by the AQMN (Air Quality Monitoring Network) reference stations deployed by governmental organisations. The spatial density of AQM reference stations is typically low, therefore data from nodes with low-cost sensors complements the AQM reference network. The data are raw, meaning that the data are as measured by a real sensor in the field, without being calibrated. As the nodes are deployed over a wide area of about 30Kmx25Km, many different studies can be carried out in various research fields: calibration, data fusion, sensor networks, interpolation of calibrated ozone concentrations, among others. Therefore, the major beneficiaries of these data are researchers in air pollution monitoring networks using low-cost sensors. The data can be used by researchers in several fields: i) researchers interested in building maps with ozone concentrations in areas where AQM reference stations are not deployed, ii) researchers interested in testing machine learning algorithms for low-cost sensor calibration, iii) researchers interested in studying network of heterogeneous data sources including reference stations and low-cost sensor nodes. Each node has up to four O3 metal-oxide sensors mounted on it. As the deployment has twenty-five nodes, it adds up to one hundred low-cost sensors. In addition, some nodes were left in place at reference stations throughout the summer to have a large part of the dataset in the same place as the reference data, while other nodes were moved to volunteers' homes. This allows calibration-only studies with a large number of sensors, or research of sensors forming a network.

Data Description

The data can be accessed in Zenodo (http://doi.org/10.5281/zenodo.4570449). There are twenty-five CSV files called “captor170xx_raw.csv” with xx=01, 02, …, 27, where two numbers (08 and 24) are left over because nodes 17008 and 17024 were not deployed on the testbed in Spain. A zip file called captor_2017_raw_files.zip is also provided which includes all CSV files for easy downloading of all files. Each CSV file corresponds to a Captor node. The format of each CSV file is as follows: date;s1_o3;s2_o3;s3_o3;s4_o3;s_temp;s_rh;location 07/04/2017 01:30;637.43;337.4423;310.9938;128.4446;9.0;62.0;PR 07/04/2017 02:00;651.7123;354.1657;314.7233;135.8793;9.0;60.8;PR 07/04/2017 02:30;621.0247;377.149;314.367;128.5283;9.0;58.9;PR Each row is a measure, except for the first row of the file which is a header indicating the content of each column. The meaning of the columns is as follows: date: timestamp with the date of the sample taken in Universal Coordinated Time (UTC) format. The format given in the files is dd/mm/yyyy H:M, where dd = day, mm = month, yyyy = year, H = hour, and M = minute, and samples are taken every half an hour. s1_o3: value in KΩ of the first SGX Sensortech MICS 2614 metal-oxide (O3) sensor, s2_o3: value in KΩ of the second SGX Sensortech MICS 2614 metal-oxide (O3) sensor, s3_o3: value in KΩ of the third SGX Sensortech MICS 2614 metal-oxide (O3) sensor, s4_o3: value in KΩ of the fourth SGX Sensortech MICS 2614 metal-oxide (O3) sensor, s_temp: value in °C of the temperature sensor, s1_rh: value in% of the relative humidity sensor, location: label indicating the location of the sensor. The label can be of two types: a reference station label (PR, MANLLEU, TONA, VIC, MONTSENY), or a volunteer's house (VOL). Table 1 shows the GPS coordinates of each reference station or the name of the villages where the nodes were deployed. The GPS coordinates of the volunteers' houses where the nodes were deployed are not included for privacy reasons. The column "Reference stations used to calibrate the sensors" shows the reference station used to calibrate the sensors using a multiple linear regression algorithm before they were deployed at the volunteers' houses. We do not provide the calibrated data as this data can be obtained from the raw data provided here using any type of machine learning algorithm, not necessarily multiple linear regression, as discussed in the next subsection. So, although we provide the location where the sensors were calibrated, all data pertaining to the CSV files, including those labelled "VOL", are raw data.
Table 1

Location names, location labels and GPS coordinates where the nodes were deployed.1

GPS coordinates
Location (Ref. Station name or the village name) where the nodes were deployed“Location” label in the data fileReference station used to calibrate the sensorsLongitudeLatitude
Ref. Station of Barcelona (Palau Reial)PRPR41.38728892.115666667
Ref. Station of ManlleuMANLLEUMANLLEU42.00336672.287091667
Ref. Station of VicVICVIC41.93491672.239825
Ref. Station of Tona (Zona Esportiva)TONATONA41.8467382.217445
Ref. Station of Montseny (La Castanya)MONTSENYMONTSENY41.77938892.357997222
CantonigrosVOLMANLLEU
St. Vicenç de TorelloVOLMANLLEU
St. Pere de TorelloVOLMANLLEU
CentellesVOLTONA
GurbVOLMANLLEU
Els Hostalets de BalenyaVOLTONA
Sta. Eulalia de RiuprimerVOLTONA
Sta. Cecilia de VoltregaVOLMANLLEU
St. Marti SescortsVOLMANLLEU
TaradellVOLTONA
CalldetenesVOLTONA
SentmenatVOLPR
Llinars del VallesVOLTONA
Tona-ciutatVOLTONA
CanovesVOLTONA
Sta. Eugenia de BergaVOLTONA
BarcelonaVOLPR
BadalonaVOLPR
Vilanova del VallesVOLPR
St. Cebria de VallaltaVOLPR
Location names, location labels and GPS coordinates where the nodes were deployed.1 Table 2 shows the different locations where each node was deployed during the 2017 campaign. In the first column, “Captor Label” shows the captor name that appears in the CSV file. The second to fifth columns, show the “location” of each node followed by the “starting date”, “end date” and “number of samples”. The label “location” in this table corresponds to the same label in the CSV file. When the label is (PR, MANLLEU, VIC, TONA or MONTSENY) means that the node was placed during the indicated period of time at a reference station. When the label is VOL (village name) means that the node was placed in a volunteer's house located in the “village name”. The village name does not appear in the CSV data file. Therefore, when the label VOL appears in the CSV file, the village name has to be obtained by consulting Table 2.
Table 2

Locations and dates in which each Captor node was deployed.

date
CaptorLabelLocationStarting dateEnd dataNumber of samples
Captor17001PRMANLLEUVOL (Cantonigros)MANLLEU06/04/2017 12:0008/05/2017 12:0031/05/2017 12:0018/09/2017 13:0008/05/2017 06:0026/05/2017 07:0018/09/2017 10:3024/09/2017 06:3012768515215275
Captor17002PRMANLLEUVOL (St. Vicenç de Torello)MANLLEU06/04/2017 12:0008/05/2017 12:0031/05/2017 12:0018/09/2017 12:3008/05/2017 06:0026/05/2017 07:0018/09/2017 09:3005/10/2017 09:0012928535248809
Captor17003PRMANLLEUVOL (St. Pere de Torello)MANLLEU06/04/2017 12:0008/05/2017 09:0030/05/2017 13:0018/09/2017 12:3008/05/2017 06:0026/05/2017 07:0018/09/2017 09:0005/10/2017 09:0012918585194807
Captor17004PRTONAVOL (Centelles)TONA06/04/2017 12:0026/05/2017 12:0014/06/2017 12:0022/09/2017 11:0008/05/2017 06:0014/06/2017 07:0022/09/2017 09:3005/10/2017 10:0014308994798622
Captor17005PRMANLLEUVOL (Gurb)MANLLEU06/04/2017 12:0008/05/2017 14:0031/05/2017 12:0018/09/2017 09:3008/05/2017 09:0026/05/2017 07:0018/09/2017 09:0005/10/2017 09:0013438505233804
Captor17006PRTONAVOL (Sta. Eulalia de Riuprimer)TONA06/04/2017 14:0008/05/2017 15:0002/06/2017 17:0019/09/2017 12:0008/05/2017 06:0026/05/2017 09:0019/09/2017 08:0005/10/2017 10:003818504580762
Captor17007PRTONAVOL (Els Hostalets de Balenya)TONA30/04/2017 01:3008/05/2017 15:0002/06/2017 12:0019/09/2017 12:0008/05/2017 06:0026/05/2017 09:0019/09/2017 07:0005/10/2017 10:003737415095764
Captor17009PR06/04/2017 17:0011/10/2017 08:007599
Captor17010PRMANLLEUVOL (Sta. Cecilia de Voltrega)MANLLEU06/04/2017 14:0008/05/2017 11:0030/05/2017 10:0018/09/2017 12:3008/05/2017 06:0026/05/2017 07:0018/09/2017 08:0005/10/2017 08:3012598384960787
Captor17011PRMANLLEUVOL (St. Marti Sescorts)MANLLEU07/04/2017 01:3008/05/2017 11:0030/05/2017 11:0019/09/2017 02:3008/05/2017 06:0026/05/2017 07:0018/09/2017 11:0005/10/2017 08:3012288364621374
Captor17012PRTONAVOL (Taradell)TONA06/04/2017 15:0008/05/2017 14:0002/06/2017 09:0019/09/2017 12:3008/05/2017 06:0026/05/2017 09:0017/09/2017 00:3005/10/2017 10:0011008304982741
Captor17013PRMANLLEU06/04/2017 14:0008/05/2017 11:0008/05/2017 06:0004/10/2017 11:3010006614
Captor17014PRTONAVOL (Calldetenes)TONA06/04/2017 14:0008/05/2017 13:0003/06/2017 12:0019/09/2017 12:3008/05/2017 06:0026/05/2017 03:3019/09/2017 08:3005/10/2017 10:005358374337717
Captor17015PRVOL (Sentmenat)PR28/04/2017 14:0009/06/2017 11:0029/09/2017 15:3008/05/2017 06:0029/09/2017 10:3011/10/2017 08:0011465256547
Captor17016PRMONTSENYVIC28/04/2017 14:0009/05/2017 10:0026/05/2017 10:0008/05/2017 06:0022/05/2017 10:0005/10/2017 09:004976046035
Captor17017PRTONA06/04/2017 14:0008/05/2017 13:0008/05/2017 06:0005/10/2017 10:0013116966
Captor17018PRVOL (Badalona)PR28/04/2017 14:3030/06/2017 16:0005/10/2017 13:0030/06/2017 07:3025/09/2017 04:3009/19/2017 09:0011342921180
Captor17019PRVOL (Barcelona)PRVOL (Barcelona)PR07/04/2017 13:3029/06/2017 14:3019/07/2017 11:0027/07/2017 16:0006/10/2017 13:0028/06/2017 10:0013/07/2017 12:0027/07/2017 14:0005/10/2017 14:3011/10/2017 08:0019954853383116226
Captor17020PRMONTSENYPRVOL (St. Cebria de Vallalta)29/04/2017 02:0009/05/2017 11:0023/06/2017 10:0030/06/2017 12:0009/05/2017 06:0020/05/2017 21:0030/06/2017 07:0009/10/2017 18:0033953817454748
Captor17021PRTONAMONTSENY06/04/2017 14:0030/05/2017 08:0021/06/2017 14:0026/05/2017 06:0021/06/2017 08:0006/10/2017 08:3018298374927
Captor17022PRTONAVOL (Llinars del Valles)TONA06/04/2017 13:3008/05/2017 13:0009/06/2017 09:0022/09/2017 11:3008/05/2017 06:0030/05/2017 07:0022/09/2017 06:3005/10/2017 10:00126710064921609
Captor17023PRTONAVOL (Tona-ciutat)TONA06/04/2017 14:0026/05/2017 10:0014/06/2017 10:0022/09/2017 11:0026/05/2017 06:0014/06/2017 00:0020/09/2017 07:3005/10/2017 10:0016288694124623
Captor17025PRTONAVOL (Canoves)TONA06/04/2017 14:0026/05/2017 11:0021/06/2017 13:0022/09/2017 11:0026/05/2017 06:0014/06/2017 07:0022/09/2017 07:0005/10/2017 10:008668764255609
Captor17026PRMONTSENYPRVOL (Vilanova del Valles)PR07/04/2017 09:3009/05/2017 11:0023/05/2017 11:0004/07/2017 10:0005/10/2017 14:0009/05/2017 06:0021/05/2017 12:0004/74/2017 06:0024/09/2017 07:0011/10/2017 08:0071742617333841270
Captor17027PRTONAVOL (Santa Eugenia de Berga)TONA06/04/2017 12:3008/05/2017 13:0002/06/2017 11:0019/09/2017 12:0008/05/2017 06:0026/06/2017 09:0019/09/2017 09:3005/10/2017 10:0012698355103747
Locations and dates in which each Captor node was deployed.

Experimental Design, Materials, and Methods

During the H2020 CAPTOR project (2016–2018), tropospheric ozone sensor nodes were deployed in three countries: Spain, Italy and Austria [1]. The objective of the project was to raise citizens' awareness of the pollution they suffer from and to involve them, through a citizen science process, in solving or relating to the problem of this pollutant. During the project, two types of nodes were developed using the Do-It-Yourself (DiY) philosophy: thirty-five Captor nodes and twenty-five Raptor nodes. Captor nodes were used in the Spanish and Italian testbeds, while Raptor nodes were used in the Austrian and Italian testbeds. The nodes in Austria and Italy were spread across the respective countries, while the nodes in Spain were concentrated in a small area of about 25Kmx30Km. This paper focuses on the twenty-five Captor nodes deployed in the Spanish testbed that formed a network together with the reference stations present in the area. Captor nodes have been developed at the Polytechnic University of Catalonia (UPC, Barcelona, Spain). These nodes use an Arduino microprocessor that is connected to a sensor board that can incorporate up to five metal-oxide sensors, plus a temperature and relative humidity sensor. The project decided to use four SGX Sensortech MICS 2614 metal-oxide (O3) sensors, plus a temperature and relative humidity sensor from Grove. The reason for using four MOX sensors was to increase the robustness of the equipment in case of sensor failure due to their low cost and the variability in the quality of their measurements. Through a calibration process, the sensor that gave the best quality was chosen as the representative of that node. The node was connected via a cellular connection to a repository where the ozone values (in µg/m3) representative of the node were stored. If a sensor failed, the second best sensor was chosen to represent the node. The raw data were also stored in the repository for research purposes. Sensortech metal-oxide sensors measure O3 using a voltage divider circuit that has a load resistor and a variable resistor. Each time the O3 concentration changes, the variable resistor changes. The resistance value represented by the O3 sample is obtained by measuring the voltage VL across the load resistor RL after quantifying the signal with an A/D converter and converting this voltage to a raw resistance measurement (in KΩ): Where Vcc is the reference voltage (5 Volts). The temperature sensor gives the values in °C and the relative humidity sensor in percentage (%). The ozone sensor manufacturer does not calibrate the sensors, and the calibration has to be done in the field [2]. This means that the value obtained by the voltage divider must be converted to ozone concentrations by a calibration process, since the values obtained by the ozone sensor are in KΩ and not in µg/m3. The calibration process consisted of placing the sensors on an AQM reference station for two or three weeks and performing a machine learning algorithm against the reference values. Examples of machine learning algorithms are multiple linear regression (MLR), random forest (RF) or support vector regression (SVR), among others. For example, using a multiple linear regression and introducing temperature and relative humidity as corrective features, Where yo3 is the ozone concentration in µg/m3 of the AQM reference station, (xo3, xTemp, xRH) are the raw sensor measurements in (KΩ, °C,%), and β= (β0, β1, β2, β3) are the calibration or regression coefficients that the multiple regression model must obtain. Once the regression coefficients are obtained, the sensor node is placed elsewhere, and then new O3 concentration values can be derived by measuring the raw data (xo3, xTemp, xRH) using the calibration β coefficients. In the case of using other machine learning algorithms, the idea is the same: use the data gathered during the calibration phase to obtain the hyper-parameters of the machine learning algorithm, and then use these hyper-parameters to estimate new values when the sensors are deployed elsewhere. As can be seen, the sensors were calibrated by placing the nodes at a reference station. Table 1 indicates which reference station was used to calibrate the sensors depending on where they were deployed. To calibrate the sensors, therefore, it is necessary to have the reference values of the reference stations during the period when the nodes were placed next to these reference stations. These values are open and can be obtained at the URL: http://mediambient.gencat.cat/es/05_ambits_dactuacio/atmosfera/qualitat_de_laire/vols-saber-que-respires/descarrega-de-dades/index.htmlmanaged by the local government of Catalonia, Spain. This Website is in Catalan and Spanish. Therefore, we give a brief indication on how to download the reference values. There is a box indicated by the word “Filtros” (Filters) where you can indicate the reference station name, the type of pollutant (e.g. O3) and the start and end period of the data to be downloaded. For the data shown in this document, the names of the reference stations are given in Table 1, and the time period in which the nodes were at these reference stations is given in Table 2. The reference data can then be accessed by clicking on the CSV button in the bottom right corner of the web page. Several research papers have made use of the dataset presented in this document. Ref. [1] describes the testbeds developed during the H2020 CAPTOR project. Ref. [2] is a survey that describes mechanisms to calibrate low-cost sensors in an uncontrolled environment. Ref. [3] compares the calibration performance of the Captor sensors when calibrated using various machine learning algorithms such as multiple linear regression, random forest and support vector regression. Ref. [4] shows the impact of environmental conditions on Captor sensors calibration. Refs. [5] and [6] study the fusion of data taken from the four Captor sensors placed at the same node as a way to reduce the estimation error with respect to the use of a single sensor. Ref. [7] propose a graph sensing framework to re-calibrate sensors, impute missing data, and reconstruction techniques in an heterogenous monitoring air pollution network with reference stations and low-cost sensor nodes. The research papers submitted used the Captor testbed to evaluate the performance of the techniques analysed using a real IoT sensor framework. We believe that the dataset is useful for other studies that may need real sensor data to test the performance of techniques and models investigated in air pollution monitoring networks, such as the use of other machine learning algorithms, multi-hop or distributed calibration in a sensor network, graphical signal processing analysis, spatial studies using interpolation algorithms, creation of virtual sensors, etc.

CRediT Author Statement

Jose M. Barcelo-Ordinas: Conceptualization, Formal analysis, Investigation, Writing - Original Draft, Writing - Review & Editing; Jorge Garcia-Vidal: Conceptualization, Supervision, Project administration; Pau Ferrer-Cid: Data Curation, Software, Formal analysis; Mar Viana: Conceptualization, Supervision, Project administration; Anna Ripoll: Formal analysis, Validation, Investigation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.
SubjectEnvironmental science (air pollution).
Specific subject areaMeasurements of tropospheric ozone (O3) using metal-oxide low-cost sensors.
Type of dataCSV files
How data were acquiredCaptor nodes designed and built by UPC (Universitat Politècnica of Catalunya, Spain), during the H2020 CAPTOR project, using SGX Sensortech MICS 2614 ozone (O3) metal-oxide (MOX) sensors.
Data formatRaw: ozone in KΩ, temperature in °C, relative humidity in %.
Parameters for data collectionMaximum ozone concentrations occur in summer with high temperatures and high solar radiation. Thus, the data are sensitive to environmental conditions (temperature and relative humidity). Each node first passed through a reference station in April 2017 (spring) in an urban area of Barcelona to test the nodes. Then, the nodes were moved to the deployment area with different environmental conditions during the period from May to September (late spring and all summer), 2017. Each ozone sample presented in the files is the result of an average of 100 samples taken every 5 s, to avoid outliers and spikes in the data.
Description of data collectionThe nodes passed through four locations: reference station in Barcelona, reference station near the deployment site, deployment site and reference station near the deployment site. Each node sent every half hour the value measured by each sensor to a cloud repository. The measured values of the metal-oxide sensors, after passing through the analogue-to-digital converter, are in Ohms. The values of the temperature and relative humidity sensors are in °C and %. The values presented in this document are the raw values sent by the nodes to the repository.
Data source locationCatalonia, Spain.
Data accessibilityRepository name: Zenodo
Data identification number: twenty-five files called captor170xx_raw.csv with xx=01, 02, …, 27, where two numbers (08 and 24) are left over because nodes 17008 and 17024 were not deployed on the testbed in Spain. A zip file called captor_2017_raw_files.zip is also provided which includes all CSV files.
Direct URL to data: http://doi.org/10.5281/zenodo.4570449
Related research articleA. Ripoll, M. Viana, M. Padrosa, X. Querol, A.Minutolo, K-M. Hou, J. M. Barcelo-Ordinas and J. Garcia-Vidal, (2019). Testing the performance of sensors for ozone pollution monitoring in a citizen science approach. Science of the total environment (Elsevier), 651, 1166–1179, https://doi.org/10.1016/j.scitotenv.2018.09.257
  2 in total

1.  Testing the performance of sensors for ozone pollution monitoring in a citizen science approach.

Authors:  A Ripoll; M Viana; M Padrosa; X Querol; A Minutolo; K M Hou; J M Barcelo-Ordinas; J Garcia-Vidal
Journal:  Sci Total Environ       Date:  2018-09-22       Impact factor: 7.963

2.  Distributed Multi-Scale Calibration of Low-Cost Ozone Sensors in Wireless Sensor Networks.

Authors:  Jose M Barcelo-Ordinas; Pau Ferrer-Cid; Jorge Garcia-Vidal; Anna Ripoll; Mar Viana
Journal:  Sensors (Basel)       Date:  2019-05-31       Impact factor: 3.576

  2 in total
  2 in total

1.  Detecting Shipping Container Impacts with Vertical Cell Guides inside Container Ships during Handling Operations.

Authors:  Sergej Jakovlev; Tomas Eglynas; Miroslav Voznak; Mindaugas Jusis; Pavol Partila; Jaromir Tovarek; Valdas Jankunas
Journal:  Sensors (Basel)       Date:  2022-04-02       Impact factor: 3.576

2.  Raw data collected from NO 2 , O 3 and NO air pollution electrochemical low-cost sensors.

Authors:  Pau Ferrer-Cid; Jose M Barcelo-Ordinas; Jorge Garcia-Vidal
Journal:  Data Brief       Date:  2022-09-12
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.