Literature DB >> 34642613

Artificial Intelligence Techniques for the Non-invasive Detection of COVID-19 Through the Analysis of Voice Signals.

Laura Verde¹, Giuseppe De Pietro², Giovanna Sannino².

Abstract

Healthcare sensors represent a valid and non-invasive instrument to capture and analyse physiological data. Several vital signals, such as voice signals, can be acquired anytime and anywhere, achieved with the least possible discomfort to the patient thanks to the development of increasingly advanced devices. The integration of sensors with artificial intelligence techniques contributes to the realization of faster and easier solutions aimed at improving early diagnosis, personalized treatment, remote patient monitoring and better decision making, all tasks vital in a critical situation such as that of the COVID-19 pandemic. This paper presents a study about the possibility to support the early and non-invasive detection of COVID-19 through the analysis of voice signals by means of the main machine learning algorithms. If demonstrated, this detection capacity could be embedded in a powerful mobile screening application. To perform this important study, the Coswara dataset is considered. The aim of this investigation is not only to evaluate which machine learning technique best distinguishes a healthy voice from a pathological one, but also to identify which vowel sound is most seriously affected by COVID-19 and is, therefore, most reliable in detecting the pathology. The results show that Random Forest is the technique that classifies most accurately healthy and pathological voices. Moreover, the evaluation of the vowel /e/ allows the detection of the effects of COVID-19 on voice quality with a better accuracy than the other vowels. © King Fahd University of Petroleum & Minerals 2021.

Entities: Chemical

Keywords: COVID-19 detection; Healthcare sensors; Machine learning techniques; Voice analysis; Vowel sounds

Year: 2021 PMID： 34642613 PMCID： PMC8500467 DOI： 10.1007/s13369-021-06041-4

Source DB: PubMed Journal: Arab J Sci Eng ISSN： 2191-4281 Impact factor: 2.807

Introduction

The ageing of the population, the diffusion of chronic conditions and the outbreaks of infectious diseases and new pandemics, such as COVID-19 which has been affecting the world since last year, all represent major challenges in our present-day society [22]. The acquisition, processing and analysis of health information constitute significant tasks in relation to the early detection and treatment of major diseases. The use of unobtrusive sensing and wearable devices constitutes a valuable support in the acquisition of health data anywhere and anytime. Sensors can be integrated into clothing, accessories and the living environment. Advanced electronic devices can provide detailed health information, monitoring continuously and in real-time biochemical and physiological parameters during the daily life of the patient [20, 21, 34, 54]. These sensors are particularly useful in the monitoring of the health condition of the millions of people who have been afflicted by COVID-19. This pandemic has so far affected more than 80 million people and at present its diffusion shows no sign of reaching a conclusion [53]. While the application of increasingly advanced, easy-to-use and wearable technologies has helped to improve the processes of patient care through the continuous real-time monitoring of vital parameters and the definition of personalized treatment, on the other hand this has contributed to the accumulation of a large amount of data from such devices and the consequent need for its accurate analysis and processing. The successful application of these sensors, in fact, hinges on the ability to extract, interpret and elaborate the considerable volume of heterogeneous data they generate in a consistent and constructive way. The complexity and variety of these data require the provision of new models, solutions and technologies able to process and analyse them reliably and easily. This objective can be achieved by using Machine Learning (ML) algorithms. These techniques, in fact, represent an important tool capable of providing different possibilities to transform such data into valid insights to be used for the construction of reliable decision-making models necessary to provide accurate healthcare to patients. In addition, their use in health care contributes to the optimization of resources through the improvement of processes and services with a consequent reduction in costs [10, 32]. In this study, we explore the opportunity to support the early assessment and detection of COVID-19 symptoms through the evaluation of voice sounds. COVID-19 patients, including asymptomatic subjects, have, in fact, reported difficulties in voice production as well as abnormalities in vocal folds oscillation [24, 40]. In detail, we have analysed and studied the sounds of three vowels, /a/, /e/ and /o/. These were selected from the Coswara database, an available crowd-sourced database [46]. The aim has been to evaluate the possibility to support the early and non-invasive detection of COVID-19 by analysing the vocalization of a vowel through the use of the most appropriate ML algorithm. In particular, on one hand, we have investigated among all the three vowels sound present in the dataset considered, which vowel reflects more accurately the effects of COVID-19 on voice production, so which vowel is more appropriate to detect the coronavirus disease. On the other hand, we have performed an evaluation of which, among an ample number, ML technique is more performing in terms of correct classification. The possibility to detect COVID-19 by easily analysing short vowel sounds could be significant in terms of the realization of a mobile health (m-health) solution able to acquire the vocalization sound, analyse it and distinguish between healthy and pathological subjects through the most reliable ML technique. Such an m-health system could constitute a valid instrument for a fast and easy screening, reducing the time required to map the spread of the infection, as well as the costs involved in its detection. The remaining sections of the manuscript are organized as follows. The main works relating to the diagnosis of COVID-19 existing in literature are discussed in Sect. 2. The voice samples, features and ML models evaluated in this study are, instead, presented in Sect. 3. Finally, the results obtained are discussed in Sect. 4, while our conclusions are presented in Sect. 5.

Related Works

Due to the recent diffusion of the COVID-19 pandemic and the continuous mutations of the virus with the formation of new variants, very few studies exist in the literature about the detection of COVID-19 symptoms through the analysis of voice signals. Respiratory signals [5, 18, 48] or coughing sounds [4, 7, 23, 27, 36, 37, 47] constitute the main vocal signals for the detection of the effects of COVID-19. Other studies, instead, have identified pathomorphological modifications caused by the pandemic in the patient’s chest by analysing Computed Tomography (CT) images [35, 55] or chest radiographic [1, 52]. However, limited and, often, non-accessible datasets, have been used to perform these preliminary studies, reducing the possibility for further development of reliable classification approaches on standardized datasets for the research community. A small dataset composed of only 9 healthy and 10 pathological subjects, for example, was used in [42] to evaluate the accuracy of the proposed Support Vector Machine (SVM) model. Mel filter bank features represent the inputs to this model, achieving an F1-score and an accuracy, respectively, of 77.0% and 70.5%. A SVM algorithm was, also, used in [17] to detect pandemic symptoms by evaluating voice samples. The authors proposed a system able to analyse the severity of the disease by evaluating the vocalization of five sentences. Voice samples were collected from 52 pathological subjects in two hospitals in Wuhan, China. The Geneva Minimalistic Acoustic Parameter and Computational Paralinguistics Challenge sets were estimated and used as inputs of the SVM. An accuracy of 69% was achieved. A Convolutional Neural Network (CNN) model capable of detecting the anomalies in the dynamics of the glottal flow waveform (GFW) during voice production was, instead, proposed in [11]. A private database containing recordings of the vowels /a/, /i/ and /u/ voiced by a limited sample of only 9 pathological and 10 healthy subjects was analysed. The performance is presented in terms of the Receiver Operating Characteristic (ROC-AUC) and its standard deviation, respectively, equal to 0.900 and 0.062. In [18], a deep learning approach was proposed. The features extracted from the vocal, breathing and coughing sounds were processed by a Long Short-Term Memory (LSTM) architecture. The dataset evaluated consists of 80 subjects (60 healthy and 20 pathological). As features, the Spectral Roll-off (SR), Spectral Centroid (SC), Mel-Frequency Cepstral Coefficients (MFCC), the first and second derivates of MFCC and the Zero-Crossing rate (ZCR) were considered. The LSTM model, relating to the voice samples, achieved an accuracy and F1-score, of 88.2% and 92.5%, respectively. A new feature, called the COVID-19 Coefficient (C-19CC), was, instead, proposed by Dash et al. [8] to detect the presence of COVID-19 symptoms through an analysis of opportune sounds selected from the Coswara database. The reliability of these cepstral features to distinguish correctly between pathological and healthy subjects was evaluated by classifying the sample with a SVM algorithm. The best performance was obtained through an analysis of the coughing sounds, with an accuracy of about 85%. To the best of our knowledge, our study is the first and the only study that presents a reproducible analysis by using a freely available dataset and an exhaustive overview of how different vowels sounds and ML techniques impact on the COVID-19 detection.

Materials and Methods

Voice samples were selected from the Coswara database, a readily available database [9]. The Indian Institute of Science (IISc) Bangalore realized this database and it contains coughing, breathing and voice sounds of healthy and pathological subjects. In this preliminary study, we analysed voice samples from 166 subjects, 83 healthy and 83 COVID-19 positive. 46 female and 120 male voices were selected with a mean age of 33 years. More details about the number, gender and age of the subjects involved in this study are shown in Table 1.

Table 1

Details about the subjects involved in this study. For the age, we report the mean and standard deviation (SD)

Category	Gender	No	Age
Category	Gender	No	Mean SD
Healthy	Female	21	29.6 ± 10.1
	Male	62	36.4 ± 13.1
	Total	83	34.7 ± 12.7
Covid-positive	Female	25	32.08 ± 11.4
	Male	58	31.2 ± 11.5
	Total	83	31.5 ± 11.4
Total	Female	46	30.9 ± 10.7
	Male	120	33.9 ± 12.6
	Total	166	33.1 ± 12.1

Bold italics indicate the total obtained for each category (healthy, covid-positive and all subjects involved in this study)

In this preliminary study voice samples with an adequate quality, not particularly corrupted by noise, were selected, although all the sounds were filtered by using an opportune filter to reduce the effect of this noise [33]. It is important to note that Coswara is a crowd-sourced database, with all samples being recorded by volunteers, and therefore it is necessary to control the quality of the voice signals. Although the database adopted, Coswara, provides voice, coughing and breathing sounds for each subject, we decided to evaluate the effects of COVID-19 by using only the vowels sounds. This choice has been made because in the medical practice, accordingly with the medical guidelines, experts analyse the vowels characteristics in order to estimate any pneumo-phono-articulatory apparatus disorders [30, 38]. Scientific studies confirmed that sustained vowels are rated significantly more than continuous speech [29]. Based on this scientific evidence, we have also consulted a medical team of the Department of Otorhinolaryngology, of the University Hospital (Policlinico) Federico II of Naples (Italy), that confirmed to us that the analysis of vowels sounds allows us to exhaustively extract the most relevant features useful to identify specific changes in vowel articulation and to quantify any pneumo-phono-articulatory apparatus alterations. Details about the subjects involved in this study. For the age, we report the mean and standard deviation (SD) Bold italics indicate the total obtained for each category (healthy, covid-positive and all subjects involved in this study) The sounds of three vowels, /a/, /e/ and /o/, were processed for each subject to extract the features that constitute the inputs of the considered ML algorithms. Due to the recent diffusion of the COVID-19 pandemic and the consequent scarcity of studies about the effects of this infection on voice quality, the choice of the features to be extracted from the voice sounds and to be used as inputs of ML algorithms cannot be performed in accordance with a specific medical protocol. Therefore, we decided to use as features the acoustic parameters indicated in medical protocol [28] to evaluate voice quality, such as the Fundamental Frequency (F), shimmer, jitter and Harmonic to Noise Ratio (HNR), as well as other parameters used in literature for the voice classification when using ML algorithms [16, 39, 44, 51], such as Mel-Frequency Cepstral Coefficients (MFCC) or Spectral Centroid or Roll-Off. The F is useful for an assessment of the correct functioning of the larynx as it shows the rate of oscillation of the vocal folds. The instabilities of these oscillations in amplitude and frequency are represented, respectively, by the shimmer and jitter. The incorrect closure of the vocal folds due to a pathology is, instead, represented by the noise in the voice signals. This noise is evaluated by the HNR parameter. These acoustic parameters were calculated by adopting the Java Programming Language through the use of Eclipse IDE (version 4.6.3) according to the procedures indicated in [13, 45, 50]. Other parameters, such as the MFCC coefficients represent the voice signal as the linear cosine transform of a log power spectrum on a non-linear mel scale of frequency. The dynamic behaviour of the voice signal is represented by the first and second derivatives of the cepstral coefficients. Finally, the spectral centroid (SC) and Spectral Roll-off (SR) were considered. The former is useful for an evaluation of the modifications of the signal frequency over time, while the latter is used to distinguish between unvoiced and voiced sounds. These were evaluated by using Matlab, version R2020a with the function audioFeatureExtractor being adopted [31]. These features have been used as inputs of the main ML techniques used in the literature for voice classification, as described in the following subsection.

Machine Learning Classifiers

Currently, many biomedical applications use appropriate methodologies based on machine learning (ML) techniques to support the early and reliable diagnosis of specific pathologies [43]. These techniques are, in fact, able to distinguish between a pathological and healthy subject through the processing and analysis of specific data. The evaluation of the set of data allows the construction of a model. This model approximates the so-called features, namely the values assumed by independent variables corresponding to the measurable characteristics of each sample. ML techniques are capable of learning from the observed data and adapting their structure to optimize the classification. In this study, several ML techniques were used to distinguish between a pathological and a healthy voice. The Waikato Environment for Knowledge Analysis project (WEKA) [15] tool, version 3.8.4, was used to perform the analyses. All the experiments were performed on a machine with an 8 GB memory and Intel(R) Core(TM) i5-6200U CPU with 2.40 GHz. The ML algorithms are subdivided into several categories. In this work for a better readability, only the performances of the best techniques are reported for each category. In detail, the performances of the following ML classifiers were evaluated: Bayes: the classification, for these algorithms, is based on a probabilistic model where the nodes and strings represent, respectively, a set of random variables and their conditional dependencies. This category is based on the Bayes theorem: This allows you to find the probability of event A, given event B. As shown in Eq. 1, this is estimated by means of the relationship between a priori probability of A (P(A)) and a posteriori probability of B (P(B)), i.e. the probability of event A after evidence is seen, as well as the probability of event B, given the event A (P(B|A)). The BayesNet (BN) and Naive Bayes (NB) algorithms were used in this study. This latter assumes that predictions are independent, namely that the presence of an appropriate feature in a class is unconnected to the presence of other features. In BayesNet, instead, the conditional probability is estimated on each node by building a Bayesian Network. More details are provided in [25]. Functions: the operation of the classifiers of this category can be interpreted as a mathematical equation. The classification performances of Stochastic Gradient Descent (SGD) and Support Vector Machine (SVM) were evaluated. SVM is a supervised machine learning algorithm. Its aim is to find the optimal classification function that distinguishes between the samples of the classes. The optimal hyperplane that is searched, shown in Fig. 1, is the one equally distant from the support vectors of the classes [6]. SGD, instead, implements stochastic gradient descent learning of the linear models. The true gradient is approximated by considering one training sample at a time. It is an iterative algorithm, its parameters being updated for each sample analysed [3].

Fig. 1

An overview of Support Vector Machine algorithm

Lazy: the k nearest neighbours were evaluated by means of an Instance-based Learning approach. This evaluation is necessary to decide the class to which a sample belongs. A group of k objects of the training set that has the closest proximity to the test set was localized. A label derived from the prevalence of a class in the closest proximity was assigned. The Locally Weighted Learning (LWL) and k-nearest neighbour (Ibk) algorithms were used in this work. The LWL algorithm, is a non-parametric method, where a local model for each point of interest was used to achieve the prediction. This model is based on the neighbouring data of the classifiers analysed [14]. The Ibk model, instead, represents the simplest lazy learner. The nearest neighbours can be found with a variety of different search algorithms. The predictions from more than one neighbour are weighted based on their distance from the test instance [2]. Meta: the classification of this category was achieved combining multiple ML models to improve the performance but with a consequent increment of computational time and network complexity [12]. The performances of the Adaboost and Bagging techniques were estimated in this study. Baggings bags the classifier. The averaging probability estimates generate the predictions. Adaboost, instead, was designed so that subsequent models try to correct the prediction errors made by previous models. The weights of each instance of the training set are, in fact, updated based on the accuracy of the model. In detail, Adaboost is a boosting algorithm constitute from n number of decision trees. The records incorrectly classified during the first model are priority. These records are sent as inputs for the second model and this process continues until the indicated number of base learners as shown in Fig. 2.

Fig. 2

An overview of Adaboost algorithm

Rules: the voice classification for these approaches is governed by rules. One-R and Decision Table (DT) have been evaluated. DT is a decision table classifier, where the features subsets are evaluated using best-first search [26]. The One-R algorithm, instead, uses minimum-error attributes for the prediction. The rules are based on the most informative attribute. The ranking of the attributes is estimated based on the error rate [19]. Trees: the classification is based on data attributes, hierarchical models composed of decision nodes and terminal leaves, while the branches are labelled with the discrete outcomes of the function that each decision node implements. The Random Forest and C4.5 decision tree (J48) algorithms constitute the algorithms belonging to this category considered in this study. The Random Forest algorithm consists of a various number of decision trees, as shown in Fig. 3. Each tree makes a class prediction using as inputs from samples of the initial dataset. The features extracted from these samples are randomly selected and process from the tree to predict the class. The class with the most votes constitute the model’s prediction. The C4.5 algorithm is based on the theories of Shannon. In particular, the entropy of Shannon measures the disorder of the data and defines the amount of information provided by the event [41]. Random Forest is an ensemble of trees with each tree building via bagging with replacement (bootstrap) and with a random selection of features at each tree node [49].

Fig. 3

An overview of Random Forest algorithm

An overview of Support Vector Machine algorithm An overview of Adaboost algorithm An overview of Random Forest algorithm Results achieved on the training set for the vowels /a/, /e/ and /o/ Results achieved on the testing set for the vowels /a/, /e/ and /o/

Results and Discussion

In order to evaluate the classification reliability of the ML approaches considered, the accuracy, sensitivity, specificity, F1-score, recall and Receiver Operating Characteristic (ROC) area were estimated. The accuracy is defined as the number of correct predictions out of all the samples, estimated according to Eq. 2:where True Positives (TP) and True Negatives (TN) are defined as the number of samples correctly classified, respectively, as pathological or healthy, while False Positives (FP) and False Negatives (FN) represent the number of samples incorrectly classified, respectively, as pathological and healthy. The sensitivity represents the number of pathological cases the classifier correctly classifies, out of all the pathological cases in the dataset. The specificity, instead, measures the number of healthy correct predictions made, while the precision represents the measurement of how many of the pathological predictions made are correct. These measures are calculated by using the following equations:The harmonic mean of the precision and sensitivity represents the F1-score, estimated by Eq. 6:Finally, the performance of the classifiers is evaluated considering the area under the ROC curve (AUC). This is useful for an evaluation of the goodness of the classifier. When the AUC is, in fact, the minimum (AUC=0), the ML technique incorrectly classifies all the data, while, when the AUC is equal to 1 and so is the maximum, the algorithm distinguishes perfectly between the pathological and healthy samples. The voice samples were divided randomly into training (80% of the samples) and testing (20% of the samples) sets. In detail, the sounds of three vowels (/a/, /e/ and /o/) of 132 subjects (66 healthy and 66 COVID-19 positive) constitute the training set, while the remaining recordings of 34 subjects (17 healthy and 17 COVID-19 positive) compose the testing set. The Coswara database is unbalanced, in that it contains more healthy voices than pathological voices. In this preliminary study, we have adopted a balanced dataset, selecting an equal number of healthy and pathological voice samples. However, it is important to note that the data collection is still in progress and that, in future studies, the increase in data may improve the analyses. Results achieved on the training set for the vowel /a/ Results achieved on the testing set for the vowel /a/ Results achieved on the training set for the vowel /e/ Results achieved on the testing set for the vowel /e/ Results achieved on the training set for the vowel /o/ Results achieved on the testing set for the vowel /o/ Tables 2 and 3 report the results, respectively, for the training set and testing set, of several of the ML algorithms for each sample, analysing for each subject the sound of all three vowels, /a/, /e/ and /o/. These show that the best-performing ML algorithms are Random Forest, Adaboost and SVM. Among these three algorithms, the best performance in the testing set was obtained by Random Forest, achieving a classification accuracy and F1-score, respectively, of about 82% and 84%. An accuracy of about 74% is, instead, obtained by Adaboost and SVM algorithms. Meanwhile, observing the sensitivity and specificity values, we can affirm that both Random Forest and SVM are able to accurately distinguish between the voices of people suffering from COVID-19 and those of healthy ones. This is confirmed by the sensitivity value obtained (about 94%), while the sensitivity achieved by the Adaboost algorithm stops at about 71%. Nevertheless, Adaboost obtains the best specificity (76.47% vs. 52.94% of SVM and 70.59% of Random Forest) among the three classifiers. This means that Adaboost is better able to identify healthy voices from pathological ones.

Table 2

Results achieved on the training set for the vowels /a/, /e/ and /o/

Algorithm	Sensibility (%)	Specificity (%)	Accuracy (%)	Precision (%)	F1-score (%)	AUC
Naive Bayes [25]	96.97	53.03	75.00	67.37	79.50	0.799
Bayes Net [25]	93.94	78.79	86.36	81.58	87.32	0.953
SVM [6]	100.00	100.00	100.00	100.00	100.00	1.000
SGD [3]	100.00	100.00	100.00	100.00	100.00	1.000
Ibk [2]	100.00	100.00	100.00	100.00	100.00	1.000
LWL [14]	68.18	83.33	75.76	80.36	73.77	0.941
Adaboost [12]	92.42	89.39	90.91	89.71	91.04	0.966
Bagging [12]	98.48	92.42	95.45	92.86	95.59	0.988
OneR [19]	71.21	86.36	78.79	83.93	77.05	0.788
Decision Table [26]	86.36	78.79	82.58	80.28	83.21	0.847
J48 [41]	98.48	100.00	99.24	100.00	99.24	1.000
Random Forest [49]	100.00	100.00	100.00	100.00	100.00	1.000

Table 3

Results achieved on the testing set for the vowels /a/, /e/ and /o/

Algorithm	Sensibility (%)	Specificity (%)	Accuracy (%)	Precision (%)	F1-score (%)	AUC
Naive Bayes [25]	100.00	35.29	67.65	60.71	75.56	0.706
Bayes Net [25]	88.24	29.41	58.82	55.56	68.18	0.671
SVM [6]	94.12	52.94	73.53	66.67	78.05	0.735
SGD [3]	94.12	47.06	70.59	64.00	76.19	0.706
Ibk [2]	88.24	47.06	67.65	62.50	73.17	0.676
LWL [14]	58.82	52.94	55.88	55.56	57.14	0.647
Adaboost [12]	70.59	76.47	73.53	75.00	72.73	0.785
Bagging [12]	76.47	64.71	70.59	68.42	72.22	0.747
OneR [19]	29.41	76.47	52.94	55.56	38.46	0.529
Decision Table [26]	64.71	47.06	55.88	55.00	59.46	0.554
J48 [41]	64.71	52.94	58.82	57.89	61.11	0.588
Random Forest [49]	94.12	70.59	82.35	76.19	84.21	0.901

Considering the performance of each vowel, we can observe that the recordings of the vowels /e/ identify the presence of COVID-19 better than those of the vowels /a/ and /o/. An accuracy and F1-score of about 85% and 84%, respectively, were obtained when using the features extracted from the vowel /e/ as inputs of the Random Forest algorithm in the testing set, as indicated in Table 7. When analysing the sound of the vowels /a/ and /o/, instead, the classification accuracies and F1-scores were lower than those achieved for the only vowel /e/, as shown in Tables 5 and 9.

Table 7

Results achieved on the testing set for the vowel /e/

Algorithm	Sensibility (%)	Specificity (%)	Accuracy (%)	Precision (%)	F1-score (%)	AUC
Naive Bayes [25]	70.59	70.59	70.59	70.59	70.59	0.754
Bayes Net [25]	52.94	82.35	67.65	75.00	62.07	0.749
SVM [6]	84.62	71.43	76.47	64.71	73.33	0.765
SGD [3]	64.71	76.47	70.59	73.33	68.75	0.706
Ibk [2]	58.82	52.94	55.88	55.56	57.14	0.559
LWL [14]	35.29	100.00	67.65	100.00	52.17	0.709
Adaboost [12]	64.71	76.47	70.59	73.33	68.75	0.763
Bagging [12]	82.35	70.59	76.47	73.68	77.78	0.820
OneR [19]	76.47	29.41	52.94	52.00	61.90	0.529
Decision Table [26]	58.82	64.71	61.76	62.50	60.61	0.606
J48 [41]	58.82	70.59	64.71	66.67	62.50	0.652
Random Forest [49]	76.47	94.12	85.29	92.86	83.87	0.867

Table 5

Results achieved on the testing set for the vowel /a/

Algorithm	Sensibility (%)	Specificity (%)	Accuracy (%)	Precision (%)	F1-score (%)	AUC
Naive Bayes [25]	82.35	23.53	52.94	51.85	63.64	0.713
Bayes Net [25]	47.06	70.59	58.82	61.54	53.33	0.616
SVM [6]	52.94	82.35	67.65	75.00	62.07	0.676
SGD [3]	47.06	94.12	70.59	88.89	61.54	0.706
Ibk [2]	52.94	70.59	61.76	64.29	58.06	0.618
LWL [14]	47.06	58.82	52.94	53.33	50.00	0.626
Adaboost [12]	41.18	70.59	55.88	58.33	48.28	0.683
Bagging [12]	70.59	70.59	70.59	70.59	70.59	0.744
OneR [19]	47.06	58.82	52.94	53.33	50.00	0.529
Decision Table [26]	52.94	58.82	55.88	56.25	54.55	0.578
J48 [41]	58.82	76.47	67.65	71.43	64.52	0.713
Random Forest [49]	47.06	76.47	61.76	66.67	55.17	0.739

Table 9

Results achieved on the testing set for the vowel /o/

Algorithm	Sensibility (%)	Specificity (%)	Accuracy (%)	Precision (%)	F1-score (%)	AUC
Naive Bayes [25]	94.12	11.76	52.94	51.61	66.67	0.585
Bayes Net [25]	64.71	35.29	50.00	50.00	56.41	0.533
SVM [6]	70.59	58.82	64.71	63.16	66.67	0.647
SGD [3]	70.59	35.29	52.94	52.17	60.00	0.529
Ibk [2]	64.71	70.59	67.65	68.75	66.67	0.676
LWL [14]	76.47	58.82	67.65	65.00	70.27	0.775
Adaboost [12]	64.71	41.18	52.94	52.38	57.89	0.576
Bagging [12]	64.71	35.29	50.00	50.00	56.41	0.474
OneR [19]	47.06	35.29	41.18	42.11	44.44	0.412
Decision Table [26]	76.47	41.18	58.82	56.52	65.00	0.592
J48 [41]	58.82	41.18	50.00	50.00	54.05	0.517
Random Forest [49]	70.59	52.94	61.76	60.00	64.86	0.666

It is important to note that the data collected for the considered database is still in progress. This means that the number of data changes constantly, making the task of building a common database for the entire scientific community difficult until data collection is finished, not allowing in this way the comparison of the results obtained by different studies existing in literature.

Conclusions

Nowadays, wearable sensors contribute to disease detection and patient monitoring and rehabilitation. Physiological signals coming from these sensors constitute, in fact, a fundamental resource to support specific healthcare applications. The increasingly common practice of the collection of healthcare data and the rapid development of artificial intelligence algorithms have contributed to promoting an increase in the successful application of these techniques in the healthcare sector. Artificial intelligence methods offer an opportunity to process and interpret healthcare data in a fast and reliable way, detecting, at times, clinically relevant information hidden in a vast amount of data, thereby assisting medical decision making. In this paper we have presented an overview of the artificial intelligence algorithms most frequently used for voice signal analysis in relation to the early detection of disorders caused by COVID-19. The performance of several ML techniques is presented. The aim is to identify the most reliable ML technique to be used and the most accurate vowel sound to be applied in order to distinguish between healthy and COVID-19 positive subjects. The objective is to embed this ML technique within a mobile health solution, capable of acquiring a vowel sound, analysing it and classifying it as healthy or COVID-19 positive. The utility of such a mobile health solution is indisputable in that it would support the early detection of COVID-19 in a faster, cheaper and more reliable way. The analyses have shown that the Random Forest algorithm achieves the best performance, obtaining an accuracy of about 82% in the analysis of the sound of three vowels (/a/, /e/ and /o/) for each subject. This performance improves when only the sound of the vowel /e/ is analysed (the accuracy is equal of 85%). It is important to note that all the experimental tests were performed on a dataset selected from a crowd-sourced database. Currently, all available voice databases containing samples from subjects suffering from COVID-19 are crowd-sourced, volunteers independently recording all the samples contained in this database without any control from an expert. Therefore, in order to validate an approach able to support the early detection of the pandemic, the currently available recordings need to be improved. It is fundamental to obtain samples labelled by a medical specialist during a controlled clinical trial. Additionally, it is necessary to enhance the quality of the collected data by reducing the effects of noise added during the recording, as well as by ensuring a correct execution of the vocalization. Moreover, in this preliminary study, only the effects of COVID-19 were evaluated. Due to the recent and rapid diffusion of COVID-19, there is still very little information about the causes and development of the pandemic, as well as the association with patient data. In our future research, it will be interesting to analyse data about the etiopathogenesis of the pandemic, clinical data such as age and data about pre-existing pathologies. It is important to consider the effects of these factors on voice quality, combining this information with that extracted from an analysis of the voice parameters. It would be expedient to consider the data already obtained as well as data obtained from an analysis of the other two sounds provided by the database, the coughing and respiratory samples.

Table 4

Results achieved on the training set for the vowel /a/

Algorithm	Sensibility (%)	Specificity (%)	Accuracy (%)	Precision (%)	F1-score (%)	AUC
Naive Bayes [25]	92.42	28.79	60.61	56.48	70.11	0.801
Bayes Net [25]	78.79	68.18	73.48	71.23	74.82	0.775
SVM [6]	95.45	93.94	94.70	94.03	94.74	0.947
SGD [3]	95.45	100.00	97.73	100.00	97.67	0.977
Ibk [2]	100.00	100.00	100.00	100.00	100.00	1.000
LWL [14]	80.30	65.15	72.73	69.74	74.65	0.908
Adaboost [12]	78.79	93.94	86.36	92.86	85.25	0.952
Bagging [12]	89.39	87.88	88.64	88.06	88.72	0.958
OneR [19]	75.00	86.36	81.15	82.35	78.50	0.750
Decision Table [26]	75.76	71.21	73.48	72.46	74.07	0.753
J48 [41]	98.48	96.97	97.73	97.01	97.74	0.993
Random Forest [49]	100.00	100.00	100.00	100.00	100.00	1.000

Table 6

Results achieved on the training set for the vowel /e/

Algorithm	Sensibility (%)	Specificity (%)	Accuracy (%)	Precision (%)	F1-score (%)	AUC
Naive Bayes [25]	66.67	84.85	75.76	81.48	73.33	0.861
Bayes Net [25]	60.61	91.30	76.30	86.96	71.43	0.880
SVM [6]	96.97	95.45	96.21	95.52	96.24	0.960
SGD [3]	100.00	100.00	100.00	100.00	100.00	1.000
Ibk [2]	100.00	100.00	100.00	100.00	100.00	1.000
LWL [14]	78.79	98.48	88.64	98.11	87.39	0.941
Adaboost [12]	80.30	96.97	88.64	96.36	87.60	0.965
Bagging [12]	89.39	87.88	88.64	88.06	88.72	0.961
OneR [19]	86.36	71.21	78.79	75.00	80.28	0.788
Decision Table [26]	89.16	80.30	86.64	91.93	90.52	0.800
J48 [41]	96.97	98.48	97.73	98.46	97.71	0.991
Random Forest [49]	100.00	100.00	100.00	100.00	100.00	1.000

Table 8

Results achieved on the training set for the vowel /o/

Algorithm	Sensibility (%)	Specificity (%)	Accuracy (%)	Precision (%)	F1-score (%)	AUC
Naive Bayes [25]	96.97	28.79	62.88	57.66	72.32	0.647
Bayes Net [25]	95.45	72.73	84.09	77.78	85.71	0.900
SVM [6]	89.39	86.36	87.88	86.76	88.06	0.879
SGD [3]	96.97	95.45	96.21	95.52	96.24	0.962
Ibk [2]	100.00	100.00	100.00	100.00	100.00	1.000
LWL [14]	83.33	69.70	76.52	73.33	78.01	0.919
Adaboost [12]	89.39	80.30	84.85	81.94	85.51	0.923
Bagging [12]	93.94	91.18	92.54	91.18	92.54	0.993
OneR [19]	83.33	72.73	78.03	75.34	79.14	0.780
Decision Table [26]	95.45	54.55	75.00	67.74	79.25	0.750
J48 [41]	100.00	98.48	99.24	98.51	99.25	1.000
Random Forest [49]	100.00	100.00	100.00	100.00	100.00	1.000

15 in total

Artificial Intelligence Techniques for the Non-invasive Detection of COVID-19 Through the Analysis of Voice Signals.

Introduction

Related Works

Materials and Methods

Machine Learning Classifiers

Results and Discussion

Conclusions

Review 1. Machine learning for detection and diagnosis of disease.

2. The potential for artificial intelligence in healthcare.

3. Acoustic measurement of overall voice quality: a meta-analysis.

4. Sustained vowels and continuous speech in the auditory-perceptual evaluation of dysphonia severity.

Review 5. Transforming healthcare with big data analytics and artificial intelligence: A systematic mapping study.

6. COVID-19 Artificial Intelligence Diagnosis Using Only Cough Recordings.

7. AI4COVID-19: AI enabled preliminary diagnosis for COVID-19 from cough samples via an app.

8. Detection of COVID-19 from speech signal using bio-inspired based cepstral features.

9. COVID-19 cough classification using machine learning and global smartphone recordings.

1. Voice Sequelae Following Recovery From COVID-19.

2. Social Media Data for Omicron Detection from Unscripted Voice Samples.