Literature DB >> 33732827

A dataset of lung sounds recorded from the chest wall using an electronic stethoscope.

Mohammad Fraiwan¹, Luay Fraiwan², Basheer Khassawneh³, Ali Ibnian³.

Abstract

The advancement of stethoscope technology has enabled high quality recording of patient sounds. We used an electronic stethoscope to record lung sounds from healthy and unhealthy subjects. The dataset includes sounds from seven ailments (i.e., asthma, heart failure, pneumonia, bronchitis, pleural effusion, lung fibrosis, and chronic obstructive pulmonary disease (COPD)) as well as normal breathing sounds. The dataset presented in this article contains the audio recordings from the examination of the chest wall at various vantage points. The stethoscope placement on the subject was determined by the specialist physician performing the diagnosis. Each recording was replicated three times corresponding to various frequency filters that emphasize certain bodily sounds. The dataset can be used for the development of automated methods that detect pulmonary diseases from lung sounds or identify the correct type of lung sound. The same methods can also be applied to the study of heart sounds.

Entities: Disease Gene Species

Keywords: Artificial intelligence; Deep learning; Electronic stethoscope; Lung sounds; Pulmonary diseases

Year: 2021 PMID： 33732827 PMCID： PMC7937981 DOI： 10.1016/j.dib.2021.106913

Source DB: PubMed Journal: Data Brief ISSN： 2352-3409

Specifications Table

Value of the Data

The dataset is useful for designing automated machine learning algorithms for the detection of pulmonary diseases. These data provide real lung sound recordings from 112 Middle Eastern subjects experiencing a multitude of pulmonary health conditions. The data enrich, expand, and balance the few public comparable datasets. Moreover, the data is useful for training the auscultatory skills of health professionals. This dataset will benefit biomedical engineering and artificial intelligence researchers interested in designing or testing automated methods for the detection of pulmonary diseases or the identification of the lung sound types. Moreover, medical educators and students can use the dataset for training. The data can be reused in many ways. First, the audio files can be processed to remove noise in different ways. Second, feature extraction techniques for machine learning algorithms can be proposed. Third, new machine learning algorithms can be developed and tested. Finally, stethoscope files can be reused for professional education and training purposes.

Data Description

The dataset includes respiratory sounds from one hundred and twelve subjects (35 healthy and 77 unhealthy) [1]. The subjected aged from 21 to 90, mean SD of 50.5 19.4, with 43 females and 69 males. Detailed demographic information and the number of subjects with the corresponding health condition are described in Table 1.

Table 1

Health conditions included in the dataset and the demographic information of the subjects.

Health Condition	No. of Subjects	Age Range	Gender
Normal	35	18–81	11 female, 24 male
Asthma	32	12–72	17 female, 15 male
Pneumonia	5	36–70	2 female, 3 male
COPD	9	42–76	1 female, 8 male
BRON	3	20–68	1 female, 2 male
Heart failure	21	20–83	9 female, 12 male
Lung fibrosis	5	44–90	2 female, 3 male
Pleural effusion	2	70–81	0 female, 2 male

Health conditions included in the dataset and the demographic information of the subjects. Unlike relevant datasets [2], the dataset contains one recording per subject. The duration of each recording ranges from 5 to 30 seconds, which is enough to cover at least one respiratory cycle [3], [4], [5]. The max duration is limited by recording capabilities of the electronic stethoscope [6]. There were no min duration requirements imposed on the physician examining the subjects. The number of extractable respiratory cycles for each health condition was reported elsewhere [7]. The name of each data file starts with the type of filter encoded as letter B, D, or E. This is followed by the letter P, a unique sequential patient number starting from 1, and an underscore. After that, the file name includes the diagnosis, type of sound, location of measurement on chest, subject’s age, and subject’s gender. Three types of filters were included in the data. The letter B is used with Bell mode filtration, which amplifies sounds in the frequency range [20-1000] Hz, but emphasizes the low frequency sounds in the range [20-200] Hz. The letter D is used with Diaphragm mode filtration, which amplifies sounds in the frequency range [20-2000] Hz, but emphasizes the frequency sounds in the range [100-500] Hz. The letter E is used with extended mode filtration, which amplifies sounds in the frequency range [20-1000] Hz, but emphasizes the frequency sounds in the range [50-500] Hz. Figure 1 shows a sample of the three filters along with the spectrogram.

Fig. 1

A 19-second recording of respiratory lung sound using the three filters and the spectrogram.

A 19-second recording of respiratory lung sound using the three filters and the spectrogram. The chest zone is encoded as three ordered letters from the sets {A, P}, {L, R}, and {L, M, U} respectively. The letters have the following meanings; Anterior: A, Posterior: P Left: L, Right R Lower: L, Upper: U, Middle: M. Table 2 shows the various chest zones included in the dataset and the corresponding number of subjects. Moreover, the sound type is encoded as; Inspiratory: I, Expiratory: E, Wheezes: W, Crackles: C, N: Normal, or Crepitations: Crep. Table 3 shows the number of subjects experiencing a certain respiratory sound type. The disease diagnosis is included as one of normal (N), asthma, COPD, BRON, heart failure, lung fibrosis, or pleural effusion. The gender is represented as a letter F for female or M for male.

Table 2

Chest zones included in the dataset and the corresponding number of subjects.

Location	No. of Subjects
Anterior left upper	2
Anterior right upper	6
Anterior right middle	4
Anterior right lower	4
Posterior left lower	19
Posterior left middle	12
Posterior left upper	11
Posterior right lower	24
Posterior right middle	16
Posterior right upper	14

Table 3

Sound types contained in the datset.

Sound Type	No. of Subjects
Normal	35
Crepitations	23
Wheezes	41
Crackles	8
Bronchial	1
Wheezes & Crackles	2
Bronchial & Crackles	2

Chest zones included in the dataset and the corresponding number of subjects. Sound types contained in the datset. For example, the file named “BP60_heart failure,Crep,P L L,83,F” is the Bell filtered crepitation sound taken from the posterior left lower zone of the chest of a heart failure 83 years old female patient. The Bell filter is more suitable for listening to heart sounds, which occur at a lower frequency than lung sounds [8]. The patient number is important as it is cross-referenced with the disease diagnosis and the lung sound type in the annotation file. The dataset includes the file “data annotation.xlsx”, which contains anonymous demographic information (i.e., age and gender), as well as information about the specific location, on the human chest, from where the recording was captured, see chest zones in Fig. 2. The file also contains the meaning of the various letter symbols that were used to annotate the data.

Fig. 2

The location of chest zones used to record lung sounds.

The location of chest zones used to record lung sounds. The original “.zsa” files imported from the stethoscope were also included in the set. Each of the 10 files is named according to range of patient numbers contained within. For example, the file “P1-P8.zsa” contains the recordings for patients 1 through 8. The grouping was a result of the number of subjects examined during that period and that each file can contain up to 12 recordings [6].

Experimental Design, Materials and Methods

The data was collected by placing the stethoscope on various regions of the chest called zones. The lungs are typically dived into lobes; three (upper, lower, and mid) on the right and two (upper and lower) on the left. However, the chest exam divides this region into zone that do not match the lung lobes [9]. Figure 2 shows the approximate boundaries of the chest zones. The chest zones are determined as follows [9], [10]: Upper zone: located in the region under the clavicles and above the cardiac silhouette (i.e., the superior aspect of the hilum). Sometimes this region includes the apical zone, which is located above the inferior margin of clavicles. Middle zone: located in the region between the superior and inferior aspect of the hilum. Lower zone: located in the region enclosed by the inferior aspect of the hilum and the hemidiaphragm. Table 2 shows the chest zones included in the dataset and the corresponding number of subjects. The data files were extracted from the on-board memory of the stethoscope using the 3M™Littmann® stethassist visualization software [6]. This desktop program allows for exporting the audio files in “.wav” format only using the three aforementioned filters. Moreover, it can show the spectrogram of the audio recordings, which represents the time-frequency components of the signal as a 2-dimensional color plot. The x-axis represents the time while the y-axis represents the frequency. The energy of the signal is color-coded with black corresponding to minimum energy and red representing maximum energy. The “.wav” sound files were generated to be used with signal processing techniques rather than for direct listening. The relevant clinical information is embedded in the sound files, and can be extracted using signal analysis. However, the sounds of interest are of low frequency (i.e. 20-500 Hz) and are difficult to hear with typical computer hardware. Even if the volume is increased, the sound level will still be very low as the computer hardware/speakers will attenuate the signal (i.e., act as a high pass filter). Thus, the electronic stethoscope is required to listen to the recordings, which allows the sounds to be heard very clearly. To this end, the dataset includes the original recordings as imported from the stethoscope in “zsa” format. This file type can be opened with the stethassist visualization software and listened to using the Littmann Electronic Stethoscope model 3200 or later comparable models.

Ethics Statement

All study participants (or their parents in case of underage subjects) provided written informed consent to being included in the study and allowing their data to be shared. This study was approved by the institutional review board at King Abdullah University Hospital and Jordan University of Science and Technology, Jordan (Ref. 91/136/2020). The data collection was carried out under the relevant guidelines and regulations. The authors have the right to share the data publicly.

CRediT Author Statement

Mohammad Fraiwan: Conceptualization, Software, Validation, Data Curation, Writing - Original Draft, Supervision, Project administration; Luay Fraiwan: Conceptualization, Methodology, Formal analysis, Writing - Review & Editing; Basheer Khassawneh: Methodology, Investigation, Resources, Supervision; Ali Ibnian: Validation, Investigation, Data Curation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.

Subject	Biomedical Engineering
Specific subject area	Machine Learning; Pulmonary diseases; clinical application
Type of data	Audio (.wav files)
How data were acquired	Lung sounds were acquired using an electronic stethoscope placed on various vantage points of the chest wall. The recording was performed using the 3M™Littmann® Electronic Stethoscope model 3200, and transmitted to a computer using the provided Bluetooth adaptor.
Data format	Raw; Filtered.
Parameters for data collection	None.
Description of data collection	The data was recorded with the aim of diagnosing the suspected pulmonary disease from the lung sound. The collection process did not attempt to record heart sounds. The 3M™Littmann® heart and lung sound visualization software was used to extract the recordings from the stethoscope. This software allows for exporting files using three filters (Bell, diaphragm, and extended), which emphasizes different sound frequencies corresponding to the sounds from specific organs (e.g., heart or lung).
Data source location	Institution: King Abdullah University Hospital City/Town/Region: Ramtha/Irbid Country: Jordan
Data accessibility	Repository name: Mendeley Data Data identification number: https://doi.org/10.17632/jwyy9np4gv.3 Direct URL to data: https://doi.org/10.17632/jwyy9np4gv.3 Instructions for accessing these data: Use the direct link to access the data. You may need to create a Mendeley account to login. However, the doi will become active and the data will be published upon the acceptance of this article (i.e., embargo will be removed). The stored data does not contain any reference to participants.
Related research article	L. Fraiwan, O. Hassanin, M. Fraiwan, B. Khassawneh, A. Ibnian, M. Alkhodari, Automatic identification of respiratory diseases from stethoscopic lung sound signals using ensemble classifiers, Biocybernetics and Biomedical Engineering. Vol. 41, issue 1, (2021) 1-14. https://doi.org/10.1016/j.bbe.2020.11.003

3 in total

1. Respiratory rate assessments using a dual-accelerometer device.

Authors: Sara Lapi; Federico Lavorini; Giovanni Borgioli; Marco Calzolai; Leonardo Masotti; Massimo Pistolesi; Giovanni A Fontana
Journal: Respir Physiol Neurobiol Date: 2013-11-18 Impact factor: 1.931

Review 2. Heart sound and lung sound separation algorithms: a review.

Authors: Ruban Nersisson; Mathew M Noel
Journal: J Med Eng Technol Date: 2016-07-15

3. Impact of slow breathing on the blood pressure and subarachnoid space width oscillations in humans.

Authors: Magdalena K Nuckowska; Marcin Gruszecki; Jacek Kot; Jacek Wolf; Wojciech Guminski; Andrzej F Frydrychowski; Jerzy Wtorek; Krzysztof Narkiewicz; Pawel J Winklewski
Journal: Sci Rep Date: 2019-04-17 Impact factor: 4.379

3 in total

2 in total

1. Contrastive learning of heart and lung sounds for label-efficient diagnosis.

Authors: Pratham N Soni; Siyu Shi; Pranav R Sriram; Andrew Y Ng; Pranav Rajpurkar
Journal: Patterns (N Y) Date: 2021-12-07

2. Deep learning models for detecting respiratory pathologies from raw lung auscultation sounds.

Authors: Ali Mohammad Alqudah; Shoroq Qazan; Yusra M Obeidat
Journal: Soft comput Date: 2022-09-26 Impact factor: 3.732

2 in total