Literature DB >> 31456658

Fully Synthetic Longitudinal Real-World Data From Hearing Aid Wearers for Public Health Policy Modeling.

Jeppe H Christensen¹, Niels H Pontoppidan¹, Rikke Rossing¹, Marco Anisetti², Doris-Eva Bamiou³, George Spanoudakis⁴, Louisa Murdin⁵, Thanos Bibas⁶, Dimitris Kikidiks⁶, Nikos Dimakopoulos⁷, Giorgos Giotis⁷, Apostolos Ecomomou⁸.

Abstract

Entities: CellLine Chemical Disease Gene Species

Keywords: evidence-based; hearing aids; hearing loss; longitudinal data; public health policy; real-world data; sound exposure

Year: 2019 PMID： 31456658 PMCID： PMC6700226 DOI： 10.3389/fnins.2019.00850

Source DB: PubMed Journal: Front Neurosci ISSN： 1662-453X Impact factor: 4.677

× No keyword cloud information.

Introduction

Approximately one-third of people over 65 years of age, and 5% of the world's population, is affected by hearing loss (HL) (World Health Organization, 2017). Disabling HL is associated with early cognitive decline in adults (Olusanya et al., 2014), and when unaddressed, HL restricts social integration and reduces employment and educational opportunities, hampers emotional well-being and, thus, poses an economic challenge at both the individual and national level (Wilson et al., 2017). Moreover, more and more individuals suffer from HL, which is primarily due to increases in everyday noise exposure and an increase of the aging population (World Health Organization, 2017). Despite the fact that age-related HL is the third leading cause of years lived with disability (Vos et al., 2017), the population of individuals with hearing loss is underserved because few public health policies focus on prevention, intervention, and rehabilitation for age-related HL (Reavis et al., 2016). This inadequate focus has been attributed to a lack of evidence supporting policies that actively promote hearing healthcare (Moyer, 2012; Barker et al., 2016). However, this specific issue is targeted in the EU-funded H2020 project EVOTION (www.h2020evotion.eu), which collects a large volume of heterogenous data from almost 1,000 hearing aid (HA) users with varying degrees of hearing loss to support the development of evidence-based policy making within the hearing healthcare field (Spanoudakis et al., 2017; Gutenberg et al., 2018). Data are being collected from five sources: (i) hearing aids, (ii) a smartphone app, (iii) a biosensor, (iv) audiology clinics, and (v) electronic health records. The hearing aids log data about the user's sound environment (Pontoppidan et al., 2018), hearing aid use (i.e., on/off) and hearing aid settings on a minute-by-minute basis; a phone app developed for the study collects information about the user's physical location via GPS (Dritsakis et al., 2018). Thus, EVOTION will provide an evidence base for formulating and evaluating the impacts of public health policy pertaining to prevention, early diagnosis, and treatment/rehabilitation for adults with hearing impairment. Here, we share the first outcome of EVOTION in the form of a data-set to inspire, encourage, and motivate a data-driven analytical approach to evidence-based healthcare policy modeling using real-world longitudinal data. The data-set includes information relating to patterns of real-world hearing aid usage and sound environment exposure. Undoubtedly, many such data-sources will be available for researchers and policy-makers in the future, and the data-set presented here can act as a first step of building and testing potential statistical models (Christensen et al., 2018, 2019). Specifically, the data-set represents a sub-sample of the data being collected in EVOTION. It contains longitudinally sampled observations from 53 individuals and includes the following measures: the sound environment, the hearing aid setting, logging time (timestamps), ID, and the degree of hearing loss on the best hearing ear of the individuals. Note that the ID (an integer between 1 and 53, randomly assigned to each individual) does not link to the real identity of the participants. Data are considered sensitive as they contain personal and health related information, and EVOTION adhere to strict data ethics by applying privacy-aware big data analytics (Anisetti et al., 2018). Here, we overcome the problem of sharing such personal data by working with a fully synthetic data-set that preserves structural and statistical properties of the original data (see section Technical Validation), without allowing the extraction of personal information (see section Data Synthesization). Thus, the synthetic data-set can readily be shared among professionals.

Methods

Protocol

Data collection in EVOTION follow a published protocol (Spanoudakis et al., 2017; Dritsakis et al., 2018), is ongoing, and spans 12 months from the day of recruitment to the end of study participation. The data-set presented here represents a synthesized data-sample from EVOTION. The source data span a mean of 17 days of hearing aid usage (minimum 2 and maximum 54 days), 53 participants, and a total of ~5,000 h of hearing aid usage.

Data Acquisition

Each participant in EVOTION is supplied with a pair of EVOTION hearing aids and a Samsung A3 smartphone. The hearing aids are connected to the smartphone via low-energy Bluetooth, and a custom developed EVOTION app (developed by ATC, Athens) on the smartphone logs a real-time data vector every minute consisting of data parameters from both the hearing aid's processing of sound from the microphone, hearing aid settings, and the smartphone's GPS. When connected to a wireless network, the smartphone app transmits the logged data vector (see Table 1) to the EVOTION data repository, which is located on secure distributed servers.

Table 1

Data-set variables logged every minute.

Variable name	Description	Type	Units/levels
ID	Identifier	Integer	1:53
SoundClass	A value describing the sound environment. The value is derived by the hearing aids internal processing of the acoustic variables	Categorical	QUIET, SPEECH, SPEECH-IN-NOISE, NOISE
hProg	A value describing the active hearing aid program (Dritsakis et al., 2018)	Categorical	MEDIUM, LOW, HIGH, HIGH+
hVol	A value describing the active hearing aid volume state	Integer	Steps (−9:4) represent 2.5 dB up or down from default (0)
LonRel	Relative longitude (centered for each individual)	Continuous	GPS
LatRel	Relative latitude (centered for each individual)	Continuous	GPS
LowSPL	The sound pressure level (SPL) measured in low frequency bands	Continuous	dB
MidSPL	SPL measured in middle frequency bands	Continuous	dB
HighSPL	SPL measured in high frequency bands	Continuous	dB
fbSPL	SPL measured in full bandwidth	Continuous	dB
LowNf	The noise floor (Nf) measured in low frequency bands	Continuous	dB
MidNf	Nf measured in middle frequency bands	Continuous	dB
HighNf	Nf measured in high frequency bands	Continuous	dB
fbNf	Nf measured in full bandwidth	Continuous	dB
LowME	The modulation envelope (ME) measured in low frequency bands	Continuous	dB
MidME	ME measured in middle frequency bands	Continuous	dB
HighME	ME measured in high frequency bands	Continuous	dB
fbME	ME measured in full bandwidth	Continuous	dB
Timestamp	Local time of record	ISO 8601	YYYY:MM:DD HH:MM:SS
LowSNR	The signal-to-noise ratio (SNR) from low frequency bands as SNR = SPL – Nf	Continuous	dB
MidSNR	SNR from middle frequency bands	Continuous	dB
HighSNR	SNR from high frequency bands	Continuous	dB
fbSNR	SNR from full bandwidth	Continuous	dB
LowMI	The modulation index (MI) from low frequency bands as MI = ME – Nf	Continuous	dB
MidMI	MI from middle frequency bands	Continuous	dB
HighMI	MI from high frequency bands	Continuous	dB
fbMI	MI from full bandwidth	Continuous	dB
PTA4	Pure tone average (PTA) across 4 testing frequencies (0.5, 1, 2, and 4 kHz) on the best hearing ear	Continuous	dB hearing threshold

Frequency bands corresponds to 0–1.33; 1.33–2.14; 4.14–10; and 0–10 kHz.

Data-set variables logged every minute. Frequency bands corresponds to 0–1.33; 1.33–2.14; 4.14–10; and 0–10 kHz. Clinical (e.g., audiometric tests) and demographical data are collected from the hearing clinics that have been involved in recruiting participants for EVOTION.

Acquisition of Acoustic Variables

The EVOTION hearing aids implements proprietary algorithms for continuous estimates of the acoustic environment sensed by the calibrated hearing aid microphones. The continuous estimates are derived by level estimators that implements very short time constants in four frequency channels: full bandwidth, low, mid, and high frequencies. The dynamic range of the estimators covers the dynamic range of the microphones. Noise floor is estimated from the lowest values and the modulation envelope from the largest values within a longer time-window. The SNR is obtained by subtracting the noise floor from the SPL, and the modulation index by subtracting the noise floor from the modulation envelope. Also, from the full bandwidth signal, a proprietary algorithm estimates if the current sound environment is quiet, noise, speech, or speech-in-noise dominated (i.e., the “SoundClass” variable in Table 1). In total, the acoustic environment is described by 21 variables, which the hearing aid transmits to the smartphone over Bluetooth every minute.

Data Synthetization

To enable data-sharing, and uphold differential privacy, we synthesized the data-set from a subset of the EVOTION repository data (the source) using DataSynthesizer (for details, see Ping et al., 2017). First, DataSynthesizer generates empirical conditional probability density functions (PDFs) for each variable of the data-source by computing a Bayesian network using the GreedyBayes algorithm with up to 4 parents—that is, the values in one variable can be conditioned on the values of up to four other data variables. Next, the synthesized data-set is generated by randomly drawing from the empirical PDFs while injecting each drawn sample with Laplacian noise with location 0 and scale 4(d – k)/nε to preserve privacy. Here, n is the size of the source input (rows), ε = 0.1, d is the number of variables, and k = 4. Thus, the covariance between source parameters are preserved by allowing the empirical PDFs to be conditioned in the Bayesian network. In addition, to mask absolute position from GPS measures, each latitude and longitude coordinate were centered for each individual (i.e., subtracted by the mean latitude and longitude) prior to synthesization.

Limitations and Updates

While EVOTION collects a large amount of heterogeneous data, the dataset described here represents a sub-collection of the parameters from a sub-population of all the individuals enrolled in EVOTION, which limits the data-set's usability for hypothesis testing. We expect to update the data-set with more observations and data-types once these become available in the EVOTION project for synthesization. In addition, we do not have access to low-level details of the signal-processing taking place in the hearing aids. Thus, we do not include real-time data on how the hearing aids autonomously reacts to the sound environment (e.g., adjusting noise reduction or compression characteristics).

Data Format

The data-set is stored as a comma separated values (csv) file with each row representing one vector of observations associated to a timestamp. The included 399.500 observations represent 28 variables (columns) and they are described in Table 1.

Data Access

The newest version of the dataset is named “EVOreal_time_synth.csv” and is uploaded to zenodo.org and accessible via the following DOI: https://doi.org/10.5281/zenodo.2668210.

Technical Validation

The Bayesian network generating the fully synthetic data ensures that covariance between different variables are preserved. To validate that, indeed, dependencies are still present in the fully synthetic data-set we computed statistics from both the acoustic variables (only the full-bandwidth variables were selected) and the “Timestamp” variable (see Figure 1).

Figure 1

Scatterplot matrix (panels below the diagonal), density plots (panels in the diagonal), and correlation matrix (panels above the diagonal) of the full-bandwidth acoustic variables (A) and a stacked histogram of the “Timestamp” variable (B). Data in both (A,B) are grouped and colored by the classified sound environment by the variable “SoundClass.” SPL, Sound pressure level; Nf, Noise floor; ME, Modulation envelope; SNR, Signal-to-noise ratio; MI, Modulation index.

Acoustic Variables

According to the hearing aids' estimation of the acoustic variables (see section Acquisition of Acoustic Variables), we would expect certain dependencies in the synthesized data. For example, the estimated noise floor (fbNf) should ideally always be lower than the estimated modulation envelope (fbME). The correlation matrix of the full bandwidth acoustic variables and their classification into the four discrete environments by color (“SoundClass”) are shown in Figure 1A. As expected, the noise floor is almost always lower than the modulation envelope (expect for a few outliers, see row 3, column 2 in Figure 1A). The outliers that do not follow the expected pattern are not generated by the synthesization process but instead reflects noise in the hearing aids' estimation method (outliers are also present in the source data). The color-coding indicates that clustering of the sound environment depends on more than one acoustic parameter. For example, in Figure 1A (panel in row 3, column 2), the environment is dominantly classified as “Quiet” for low levels of noise floor and modulation envelope. But as the modulation envelope passes ~60 dB the environment changes to either “Speech” or “Speech in Noise” despite no changes in the noise floor. This 3rd order dependency further validates that the data synthesization process preserves structural dependencies in the source data.

Timestamps

Each observation in the data-set is associated with a timestamp. Thus, we can aggregate the timestamps to test the hypothesis that hearing aid usage is not uniformly distributed throughout the day and, from this, validate that the data synthesization process preserves the distributional statistics of the “Timestamp” variable. Figure 1B shows the histogram of timestamps binned by hour from 0 to 23. Most timestamps fall between 5 a.m. and 9 p.m. (usual awake hours) with peaks around noon and evening (6 p.m.). In addition, most data are logged in “Quiet” or “Speech” environments, which reflects what is reported in the literature (Humes et al., 2018). Thus, the distribution of timestamps and the dependency between timestamps and the sound environment both exhibit characteristics expected from real life use of hearing aids.

Conclusion

We present a synthesized data-set containing longitudinal observations of hearing aid use and associated sound environments. The data represent real life behavior of individuals with hearing loss wearing hearing aids. The underlying reason for sharing these data is to motivate the use of such data for public health policy modeling—that is, identifying which models derive useful “high-level” information and insights from such “low-level” data observations in the field of hearing healthcare.

Data Availability

All datasets generated for this study are included in the manuscript/supplementary files.

Author Contributions

JC wrote the text, analyzed and synthesized the data-set, and prepared the figures. NP co-authored the text. RR, D-EB, LM, TB, DK, and AE recruited the participants. MA, GS, ND, and GG fascilitated the datalogging by technical developments.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

3 in total

1. Recent Developments in Privacy-Preserving Mining of Clinical Data.

Authors: Chance Desmet; Diane J Cook
Journal: ACM IMS Trans Data Sci Date: 2021-11

2. Application of Big Data to Support Evidence-Based Public Health Policy Decision-Making for Hearing.

Authors: Gabrielle H Saunders; Jeppe H Christensen; Johanna Gutenberg; Niels H Pontoppidan; Andrew Smith; George Spanoudakis; Doris-Eva Bamiou
Journal: Ear Hear Date: 2020 Sep/Oct Impact factor: 3.562

3. A Data-informed Public Health Policy-Makers Platform.

Authors: Dario Brdarić; Senka Samardžić; Ivana Mihin Huskić; Giorgos Dritsakis; Jadran Sessa; Mariola Śliwińska-Kowalska; Małgorzata Pawlaczyk-Łuszczyńska; Ioannis Basdekis; George Spanoudakis
Journal: Int J Environ Res Public Health Date: 2020-05-07 Impact factor: 3.390

3 in total