Literature DB >> 36090816

Attenuated total reflection FTIR dataset for identification of type 2 diabetes using saliva.

Miguel Sanchez-Brito1,2, Gustavo J Vazquez-Zapien3, Francisco J Luna-Rosas1, Ricardo Mendoza-Gonzalez1, Julio C Martinez-Romo1, Monica M Mata-Miranda3.   

Abstract

Diabetes is one of the top 5 non-communicable diseases that occur worldwide according to the World Health Organization. Despite not being a fatal disease, a late diagnosis as well as poor control can cause a fatal outcome, because of that, several studies have been carried out with the aim of proposing additional techniques to the gold standard to assist in the diagnosis and control of this disease in a non-invasive way. Considering the above, and in order to provide a solid starting point for future researches, we share a primary research dataset with 1040 saliva samples obtained by Fourier Transform Infrared Spectroscopy considering the Attenuated Total Reflectance method. Database include: gender, age, individuals (patients) with/without diabetes, the glucose value, and the result to the A1C test for the diabetic population. We believe that sharing dataset as is could increase experimentation, research, and analysis of spectra through different strategies broaden its range of applicability by chemists, doctors, physicists, computer scientists, among others, to identify the effects that the virus causes in the body and to propose possible clinical treatments as well as to develop devices that allow us to assist in the characterization of possible carriers.
© 2022 The Author(s).

Entities:  

Keywords:  A1C Test; Attenuated total reflection FTIR; Dataset; Diabetes; Glucose; Saliva

Year:  2022        PMID: 36090816      PMCID: PMC9428842          DOI: 10.1016/j.csbj.2022.08.038

Source DB:  PubMed          Journal:  Comput Struct Biotechnol J        ISSN: 2001-0370            Impact factor:   6.155


Introduction.

According to the World Health Organization (WHO), non-communicable diseases (NCDs) claim the lives of approximately 41 million people each year. NCDs, also known as chronic diseases, tend to be long-lasting and result from a combination of genetic, physiological, environmental, and behavioral factors. The main types of NCDs are cardiovascular diseases (such as heart attacks and strokes), cancer, chronic respiratory diseases (such as chronic obstructive pulmonary disease and asthma), and diabetes. Regarding diabetes, the WHO notes that around 95 % of people with diabetes have ineffective use of insulin by the body, this type of diabetes is known as type 2 diabetes. Despite not being a fatal disease, risks like the increased probability of suffering a heart attack compared to people who do not suffer from it, neuropathies, diabetic retinopathy or kidney damage by patients who suffer from it, can lead to death if the necessary measures are not taken to monitor the variations in glucose levels in the body that are characteristic of this disease. Despite the existence of reliable strategies and devices for both the diagnosis and control of this disease, it has not been possible to reduce the mortality rate associated with this pathology, thus, assistance in the clinical diagnosis of pathologies in a non-invasive way is currently a topic that is addressed through different techniques. Fourier Transform Infrared Spectroscopy (FTIR) is commonly used due to the nature of the samples analyzed. Through the interaction of a biological sample with electromagnetic frequencies associated with the middle region of the infrared spectrum expressed as wavenumbers (cm−1), it is possible to produce vibrations in the chemical bonds that make up the sample [1], [2]. The wavenumbers considered in the mid-infrared region range between 400 and 4000 cm−1 [1], [2], [3], [4], [5]. Once the interaction of the sample with these wavenumbers has ended, the vibrations are captured in a two-dimensional matrix called FTIR spectrum. The search for a spectral behavior attributed to a population with a specific pathology in common could be the key to assisting in clinical diagnosis through FTIR spectroscopy. However, the more elements that make up the sample that is analyzed, the more complicated it will be to detect a spectral behavior typical of pathology due to the overlapping of the vibrations of the elements that make it up [1], [2], [6], [10], [11]. According to [3], [4], [6], [7], [8], [9], [10], the highest vibrations recorded in a biological sample belong mainly to carbohydrates and nucleic acids (1225 cm−1), lipids (1750 cm−1) and proteins (amide I: 1550–1600 cm−1 and amide II: 1600–1700 cm−1 mainly). Punctually, research related to diabetes mainly highlights the differences against the control group specifically in the region of lipids and carbohydrates and nucleic acids [7], [8], [9], [10]. Although the above helps to identify the most contrasting spectral regions of the spectra that make up the studies of the different authors, it is not enough to propose a strategy that allows a spectrum to be reliably identified as suggested by the validation metrics reported by the authors [7], [8] due to the overlapping of the spectra. In this sense, multivariate analysis techniques commonly used in the area of machine learning have allowed better results, compared to techniques such as principal components (PCA) as reported by the authors of [10], [12], [13], [14], however, these strategies are not infallible, their effectiveness is closely related to the number of samples that the model can analyze to identify patterns in the populations studied, and this is a field of study currently open to research. In order to provide a starting point to the scientific community that wishes to direct its efforts to find patterns in the FTIR spectra of saliva samples, we share our database made up of 1040 patients, of which 500 formed the control group and the rest belongs to patients previously confirmed with diabetes through the gold standard strategies. The database includes the gender and age of the patient from whom the saliva sample was obtained and, in the case of diabetic patients, the glucose values and the A1C test result are provided.

Materials and methods.

Between February 2019 and February 2020, saliva samples belonging to 1040 patients were collected under informed consent and following the guidelines outlined in the Declaration of Helsinki to protect the information of patients participating in clinical research projects. The patients were confirmed with and without type 2 diabetes in the Unidad de Especialidades Médicas (UEM) of the Secretaria de la Defensa Nacional (SEDENA, Mexico) after having approved the internal research protocol with folio: 001/2019. All experiments were examined and approved by the appropriate ethics committee and followed the ethical standards laid down in the 1964 Declaration of Helsinki. The inclusion criteria set for obtaining the saliva sample from patients were: Patients between the ages of 20 and 80 gave their consent to provide a sample of approximately 1 ml of saliva in a micro-centrifuge tube previously sterilized. The patients had to be in a fasting period of at least 8 h. They should not have brushed their teeth or used mouthwash; likewise, they should not have had any orthodontic treatment or any treatment in the oral cavity. Selected patients should not wear any type of lipstick. The patients considered as a control group involve different pathologies but no type of diabetes. Every day between 10 and 15 samples were collected in a period no longer than 2 h to avoid sample degradation as much as possible. The patients considered in the present study were sampled between 6 and 8 in the morning. All the collected samples were refrigerated at a temperature between 0 °C and 4 °C and processed after the 2 h defined for the sampling process [3], [6], [11]. The saliva sample was obtained through the process of spitting by the patient into the micro-centrifuge tube, previously they were instructed to try not to stimulate the production of the same fluid. The micro-centrifuge tube was handled only by the doctor on duty using nitrile gloves each time each sample was obtained and processed. The Attenuated Total Reflection (ATR) technique was used to capture the absorbance spectrum through a Jasco FTIR-6600 spectrometer located at the Escuela Militar de Medicina (EMM) of the SEDENA. From each saliva sample, 3 μl were deposited by pipetting directly on the crystal of the equipment. Plastic tips previously sterilized were used in the pipette to obtain the mentioned amount of sample, which were discarded after placing each sample. To speed up the sample drying process, the environment temperature was increased by turning on incandescent lamps for 15 min, since after that time it was possible to appreciate the main macromolecular groups reported by [3], [4], [5], [6], [7]. Due to the large amount of water molecules with respect to others in biological samples, it has been one of the main challenges in the study of this type of specimens using FTIR spectroscopy as pointed out by [15], [16], so the drying process is essential to appreciate the main macromolecules mentioned. At the time of taking the sample, the lamps were deactivated so that the radiation they could emit would interfere with the measurement. The configuration used for the capture was 120 scans with a resolution of 4 cm−1 as suggested by [1], [2], [6]. After the drying process and once the spectrum has been captured, using the Spectra Manager version 2 software, included with the spectrometer, the noise reduction processes by H2O, CO2, and baseline adjustment were performed. In addition to the saliva sample, gender and age were recorded.

Results

The 1040 spectra were grouped in a.csv extension file according to the following distribution: Column 1: Gender, Column 2: Age, Column 3: Population (DIABETES, CONTROL), Column 4 to 3739: Absorbance values obtained with wavenumbers between 399 and 4000 cm−1. Information on the study population is presented in Table 1.
Table 1

Analyzed population.

GroupGenderTotal patientsAge (Mean) yearsTotal patients by age ranges
Age range (years)Total patients
Type 2diabetesFEMALE33660.7±11.4320–293
30–3910
40–4936
50–5999
60–69107
70–7968
80–8913
MALE20461.11±11.5420–292
30–393
40–4931
50–5945
60–6976
70–7944
80–893



ControlFEMALE32852.57±14.5220–2919
30–3938
40–4977
50–59102
60–6952
70–7929
80–8911
MALE17253.66±14.1020–298
30–3924
40–4931
50–5948
60–6932
70–7926
80–893
Analyzed population. Gender and age distributions of the diabetic/control analyzed population are graphically depicted in Fig. 1.
Fig. 1

Gender and age distribution of the groups studied.

Gender and age distribution of the groups studied. Additionally, Table 2 and Table 3 summarizes the glucose and hemoglobin values, respectively. These percentages were obtained through the A1C test from patients previously diagnosed with type 2 diabetes.
Table 2

Glucose values information.

PopulationGroupTotal samplesGlucose range (mg/dl)Range mean (mg/dl)
Type 2 diabetes1114[45–100)85.6
2217[100–150)121.9
399[150–200)172.3
460[200–250)223.5
526[250–300)272.5
624>300344.7
Table 3

Hemoglobin test A1C Test information.

PopulationGroupTotal samplesA1C Test (%)Range mean (%)
Type 2 diabetes13[4–5)4.7
261[5–6)5.6
3116[6–7)6.4
489[7–8)7.4
578[8–9)8.3
647[9–10)9.3
7106>1011.3
Glucose values information. Hemoglobin test A1C Test information. In Fig. 2, the distribution of glucose (mg/dl) and hemoglobin (%) values recorded in the patient clinical history log when collecting the saliva sample is presented.
Fig. 2

Distribution of glucose and hemoglobin values recorded for diabetic patients.

Distribution of glucose and hemoglobin values recorded for diabetic patients. In works such as those of [6], [12], [13], the difficulty of selecting a characterization technique has been pointed out, having to experiment with several of them to adapt the one that allows obtaining the best results. Comparing the measures of central tendency of two or more populations could help delimit the selection of characterization strategies, if in addition to the above, more information on the behavior of the signals is provided, the selection process of characterization techniques can be optimized. In this sense we present Fig. 3, where the average spectrum of the population previously diagnosed with type 2 diabetes as a solid red line and the average of the control group population in blue. The dotted lines involve the addition and subtraction of one standard deviation (S.D.) from the mean spectrum of each population. For example, the figure titled Control vs Type 2 diabetes population presents 20 spectra of the control group and 20 of the group previously diagnosed with type 2 diabetes; it is possible to appreciate the complexity of using FTIR spectroscopy to assist in the diagnosis of any pathology using a sample with a large number of components such as saliva.
Fig. 3

Mean spectra of populations with and without type 2 diabetes (red and blue, respectively). The dotted lines involve adding and subtracting one standard deviation from the mean. In the third figure, 20 spectra of the control group (blue) are contrasted against 20 spectra of the diabetes group (red). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Mean spectra of populations with and without type 2 diabetes (red and blue, respectively). The dotted lines involve adding and subtracting one standard deviation from the mean. In the third figure, 20 spectra of the control group (blue) are contrasted against 20 spectra of the diabetes group (red). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) The addition and subtraction of a standard deviation from the mean allows knowing the behavior of most of the signals (>90 %) recorded for a given population. In this way it would be possible to identify regions of the spectrum of a certain population that could be associated with a certain pathology if it presents a small standard deviation [13]. Likewise, it would be possible to select or avoid experimentation with clustering strategies based on the calculation of distances between spectra such as K-means, since they would be affected by the overlap of the signals. Fig. 4 shows the behavior of the means of the glucose and hemoglobin groups according to [6], [11], the analysis of only the region between (900–1900 cm−1) called biological fingerprint (BFP) is common because vibrations of a large number of molecules have been reported in this region: carbohydrates, proteins, lipids, and deoxyribonucleic acid (DNA), mainly, however, it is possible to appreciate that for the means of the glucose groups, the similarity in absorbance values could make it difficult to adopt a strategy that allows characterizing FTIR spectra of saliva samples based on the glucose values of the patients, not so for the means of the groups of hemoglobin values, where a greater separation in the groups is observed.
Fig. 4

Behavior of the means of the glucose and hemoglobin groups.

Behavior of the means of the glucose and hemoglobin groups. A region commonly omitted in the analysis of biological samples is the one between the wavenumbers 2800 and 3700 cm−1 approximately. This is due to the vibrations associated with hydrogen bonds, since it is the main constituent of saliva, the high content of bonds with this element would hide the other vibrations of the different molecular groups. However, in this region, important vibrations with lipids and proteins (amide A) have been reported [3], [5], [11]; these regions could be helpful to record the N or O glycosylation process [17], [18], [19], [20]. In the 2800–3700 cm−1 region, mainly for the means of the glucose groups, a greater dispersion in the absorbance values is observed compared to the BFP region, which could help in the characterization of FTIR spectra of saliva samples, both for diabetic patients and to propose a strategy that allows associating reliably the spectrum with a given glucose value the vibrations of these molecules have also been highlighted by [21]. However, the authors consider the vibrations recorded in the wavenumbers 1462–1747 cm−1 since their study focuses on the analysis of the BFP.

Discussion

According to the World Health Organization (WHO), diabetes is one of the top 5 non-communicable diseases that cause the most deaths worldwide, with type 2 diabetes being the most common. Despite not being a fatal disease, the deterioration in the health of patients who suffer from it can lead to death if the necessary measures are not taken to monitor the variations in glucose levels in the body that are characteristic of this disease [27]. In order to assist in the diagnosis and control of this disease, numerous organizations have developed commercial devices that allow people with this disease to monitor the sugar levels in their bodies. Despite this, the rate of deaths reported each year due to health complications associated with this disease has not decreased according to WHO figures. The foregoing could be attributed to different factors such as: the patient's own response, lack of time to perform physical activity, lack of knowledge of the correct diet that a diagnosed patient should follow, discrepancy between the measurements of the different devices to monitor the levels of sugar in the body, etc. In addition to the above, the bio fluid that the devices analyze to estimate the percentage of sugar in the body could be associated: the blood. The need to analyze blood to estimate the level of sugar in patients, although it is the most reliable way to make such an estimate, involves people suffering from this disease having to extract the body fluid (usually through a finger prick), this implies that in addition to the acquisition of the device that will carry out the analysis and display the result to the patient, the need to consider both, the equipment that allows causing the wound to extract the blood (lancets) and the means to conduct the blood to the device (reactive strips), because these consumables can only be used once, the economic investment in the short and medium term could be considerable for most patients, regardless of the physical discomfort it causes. The blood study is mainly due to the analysis of hemoglobin. Hemoglobin, a protein that links up with glucose, is found inside red blood cells. Its job is to carry oxygen from the lungs to all the cells of the body. Glucose enters your red blood cells and links up (or glycates) with molecules of hemoglobin. The more glucose in your blood, the more hemoglobin gets glycated. By measuring the percentage of A1C in the blood, you get an overview of your average blood glucose control for the past few months [28]. Although hemoglobin is exclusive to the bloodstream, various studies have already confirmed the presence of both, glucose and glycoproteins similar to hemoglobin in different biofluids such as saliva [19], [20]. Considering biofluids that are non-invasive extraction would not only eliminate physical discomfort for patients, but would also allow them to avoid the need to purchase consumables to monitor their glucose levels, however, a separate case is the strategy with which the sample will be analyzed. FTIR spectroscopy makes it possible to obtain a map of the sample's chemical structure that is analyzed thanks to the interaction of the bonds of the molecules that make it up with electromagnetic frequencies belonging to the middle region of the infrared spectrum. By analyzing the structure of samples of two or more populations, it would be possible to find a mathematical model that would allow them to be reliably characterized. Some of the studies that have shown the feasibility of using FTIR spectroscopy to assist in the diagnosis and control of diabetes are those presented by [3], [6], [7], [8], [9], [10], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], where the problem of overlap between the spectra of the studied populations is reduced thanks to machine learning (ML) techniques such as those suggested by [4], [5], [6], [22], [23], [24], [25], [26]. The studies mentioned, despite the good results reported, do not allow us to think about the development of a strategy that, through FTIR spectroscopy and using ML, allows us to reliably carry out the diagnosis and control of diabetic patients. The foregoing due to the need for a larger population as well as the processing of the samples considering different equipment but the same calibration parameters. In this way ML techniques could reliably identify patterns unique to diabetes in FTIR spectra. With the present work, we make available to the public, the signals obtained by Fourier transform infrared spectroscopy of saliva samples from 1040 patients, of which 500 belong to the control group, and 540 belong to the group of patients diagnosed with type 2 diabetes in advance. In addition to the classes to which each signal belongs, for the population diagnosed with type 2 diabetes, the values obtained for glucose and the percentage obtained by the patient for the hemoglobin A1C test are provided. The shared spectra do not contemplate any normalization, smoothing, or transformation processing except indicated in the Materials and Methods section to provide a blank framework for experimentation. The shared spectra could be used to develop additional strategies to those proposed in the current state of the art and assist in the diagnosis and control of diabetes or be used as a control group to identify differences against other pathologies.

Conclusion

According to the World Health Organization (WHO), diabetes is one of the top 5 non-communicable diseases that cause the most deaths globally [27], with type 2 diabetes being the one that most affects the population. The present work provides a compilation of FTIR spectra of saliva samples from people with and without type 2 diabetes. The database consists of 1040 samples 540 spectra of patients previously diagnosed with type 2 diabetes and 500 confirmed patients without this condition to provide a solid base to evaluate methodologies that allow to assist in the diagnosis of this pathology in a non-invasive way and complement with spectra of different populations, since according to an initial inspection of the distribution of the absorbance values of the spectra, its dispersion makes it difficult to define a region that reliably allows the characterization of the populations. We fully agree with the fact that in order to analyzing FTIR spectra of saliva samples to assist in the diagnosis and control of diabetes in which useful regions are indicated for the characterization of the populations’ spectra it is necessary to carry out certain pre-processing such as normalization and noise reduction (including standard normal variate, Savitzky-Golay filter, among others) in the signals before comparing them. Nevertheless, we also believe that sharing and reusing primary research datasets in this case, spectra without any pre-processing except for that carried out by the FTIR spectrometer itself (the noise reduction processes by H2O, CO2, and baseline adjustment)– may foster experimentation, research, and analysis of spectra through different strategies reaching a broader community of researchers as the authors of [28] suggest when sharing databases. Although it does not always happen, most patients with type 2 diabetes also have additional health disorders such as obesity, hypertension, hyperlipidemia, kidney diseases and cardiovascular disease, among others [29], [30], [31], so having the metrics associated with these ailments would be desirable. The absence of the metrics of the disorders mentioned above in the database are a limitation to reinforce the relationship between type 2 diabetes with these diseases, however, they are not a limitation to carry out associated studies for both diagnosis and monitoring of glucose levels in the human body analyzing saliva. Finally, all spectra reported in this paper are also available in editable and implementable format at https://doi.org/10.6084/m9.figshare.19450916.v1.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  18 in total

Review 1.  A review of saliva: normal composition, flow, and function.

Authors:  S P Humphrey; R T Williamson
Journal:  J Prosthet Dent       Date:  2001-02       Impact factor: 3.426

Review 2.  Biofluid spectroscopic disease diagnostics: A review on the processes and spectral impact of drying.

Authors:  James M Cameron; Holly J Butler; David S Palmer; Matthew J Baker
Journal:  J Biophotonics       Date:  2018-03-05       Impact factor: 3.207

Review 3.  Tutorial: multivariate classification for vibrational spectroscopy in biological samples.

Authors:  Camilo L M Morais; Kássio M G Lima; Maneesh Singh; Francis L Martin
Journal:  Nat Protoc       Date:  2020-06-17       Impact factor: 13.491

4.  Advanced glycosylation end products in skin, serum, saliva and urine and its association with complications of patients with type 2 diabetes mellitus.

Authors:  M E Garay-Sevilla; J C Regalado; J M Malacara; L E Nava; K Wróbel-Zasada; A Castro-Rivas; K Wróbel
Journal:  J Endocrinol Invest       Date:  2005-03       Impact factor: 4.256

5.  Public sharing of research datasets: a pilot study of associations.

Authors:  Heather A Piwowar; Wendy W Chapman
Journal:  J Informetr       Date:  2010-04       Impact factor: 5.107

6.  Infrared Saliva Analysis of Psoriatic and Diabetic Patients: Similarities in Protein Components.

Authors:  Ugo Bottoni; Raffaele Tiriolo; Salvatore A Pullano; Stefano Dastoli; Giuseppe F Amoruso; Steven P Nisticò; Antonino S Fiorillo
Journal:  IEEE Trans Biomed Eng       Date:  2015-07-21       Impact factor: 4.538

7.  Spectrochemical differentiation in gestational diabetes mellitus based on attenuated total reflection Fourier-transform infrared (ATR-FTIR) spectroscopy and multivariate analysis.

Authors:  Emanuelly Bernardes-Oliveira; Daniel Lucas Dantas de Freitas; Camilo de Lelis Medeiros de Morais; Maria da Conceição de Mesquita Cornetta; Juliana Dantas de Araújo Santos Camargo; Kassio Michell Gomes de Lima; Janaina Cristiana de Oliveira Crispim
Journal:  Sci Rep       Date:  2020-11-06       Impact factor: 4.379

8.  Diabetes-related molecular signatures in infrared spectra of human saliva.

Authors:  David A Scott; Diane E Renaud; Sathya Krishnasamy; Pinar Meriç; Nurcan Buduneli; Svetki Cetinkalp; Kan-Zhi Liu
Journal:  Diabetol Metab Syndr       Date:  2010-07-14       Impact factor: 3.320

9.  Assessment of Absorption of Glycated Nail Proteins in Patients with Diabetes Mellitus and Diabetic Retinopathy.

Authors:  Ieva Jurgeleviciene; Daiva Stanislovaitiene; Vacis Tatarunas; Marius Jurgelevicius; Dalia Zaliuniene
Journal:  Medicina (Kaunas)       Date:  2020-11-29       Impact factor: 2.430

10.  Machine learning-assisted non-destructive plasticizer identification and quantification in historical PVC objects based on IR spectroscopy.

Authors:  Tjaša Rijavec; David Ribar; Jernej Markelj; Matija Strlič; Irena Kralj Cigić
Journal:  Sci Rep       Date:  2022-03-23       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.