Literature DB >> 33301502

Early screening of autism spectrum disorder using cry features.

Aida Khozaei¹, Hadi Moradi^1,2, Reshad Hosseini¹, Hamidreza Pouretemad³, Bahareh Eskandari³.

Abstract

The increase in the number of children with autism and the importance of early autism intervention has prompted researchers to perform automatic and early autism screening. Consequently, in the present paper, a cry-based screening approach for children with Autism Spectrum Disorder (ASD) is introduced which would provide both early and automatic screening. During the study, we realized that ASD specific features are not necessarily observable in all children with ASD and in all instances collected from each child. Therefore, we proposed a new classification approach to be able to determine such features and their corresponding instances. To test the proposed approach a set of data relating to children between 18 to 53 months which had been recorded using high-quality voice recording devices and typical smartphones at various locations such as homes and daycares was studied. Then, after preprocessing, the approach was used to train a classifier, using data for 10 boys with ASD and 10 Typically Developed (TD) boys. The trained classifier was tested on the data of 14 boys and 7 girls with ASD and 14 TD boys and 7 TD girls. The sensitivity, specificity, and precision of the proposed approach for boys were 85.71%, 100%, and 92.85%, respectively. These measures were 71.42%, 100%, and 85.71% for girls, respectively. It was shown that the proposed approach outperforms the common classification methods. Furthermore, it demonstrated better results than the studies which used voice features for screening ASD. To pilot the practicality of the proposed approach for early autism screening, the trained classifier was tested on 57 participants between 10 to 18 months. These 57 participants consisted of 28 boys and 29 girls and the results were very encouraging for the use of the approach in early ASD screening.

Entities: Chemical Disease Gene Species

Year: 2020 PMID： 33301502 PMCID： PMC7728261 DOI： 10.1371/journal.pone.0241690

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Children with Autism Spectrum Disorder (ASD) are defined by their abnormal or impaired development in social interaction and communication, as well as restricted and repetitive behaviors, interests, or activities [1]. The rapid growth of ASD in the past 20 years has inspired many research efforts toward the diagnosis and rehabilitation of ASD [2-5]. In the field of diagnosis, there are several well-established manual methods to diagnose children over 18 months [6]. However, the practical average age of diagnosis is over 3 years due to the lack of knowledge about ASD and the lack of expertise for diagnosing autism [7, 8]. It is of the utmost importance to have early diagnosis/screening in order to provide early intervention which is more effective at the first few years of life than later on [7, 9–11]. It is shown that early intervention improves the developmental performance in children with ASD [12]. It has also been reported that early interventions would be cost saving for families and the treatment service systems [13, 14]. Consequently, there are two main questions: 1) can autism be screened earlier than 18 months to reduce the typical diagnosis or intervention age and 2) is it possible to employ intelligent methods for the screening of autism to eliminate the widespread need for experts? It should be mentioned that our goal was to answer these questions with respect to screening all children who may not have clear symptoms. The screened children should go through a diagnosis procedure to acquire confirmation and/or be cautiously worked with. Fortunately, there are studies in the literature showing that the age of diagnosis can be lower than 18 months. For example, Thabtah and Peebles [15] reviewed several questionnaire-based approaches that may be able to screen ASD above 6 months of age. However, those approaches, like Autism Diagnostic Interview-Revised (ADI-R) [16] and Autism Diagnostic Observation Schedule (ADOS) [17] which have been clinically proven to be effective and adequate, are time-consuming instruments [15] and need trained practitioners to use them. To reduce the dependency on the human expertise needed in using such questionnaires [8], several studies proposed machine learning methods to classify children with ASD [18, 19] using questionnaires. Their goal was to automate the process and/or find an optimum subset of questions or features. For instance, Abbas et al. [20] proposed a multi-modular assessment system combined of three modules, a parent questionnaire, a clinician questionnaire, and a video assessment module. Although the authors used machine learning to automate and improve classification process, the need for human involvement still exists in order to answer questions or assess videos. On the other hand, Emerson et. al showed that fMRI [21] can be used to predict the diagnosis of autism at the age of 2 in high-risk 6-month-old infants. Denisova and Zhao [22] used movement data from rs-fMRI from 1–2 month-old infants to predict future atypical developmental trajectories as biological features. Furthermore, Bosl, Tager-Flusberg, and Nelson [23] suggested that useful biomarkers can be extracted from EEG signals for early detection of autism. Blood-based markers [24, 25] and prenatal immune markers [26] were also proposed to diagnose ASD that can be used right after birth. Although these approaches suggest new directions towards early ASD diagnosis/screening, they are costly, require expertness and dedicated equipment, which would limit their usage. Furthermore, these methods are still in the early stages of research and require further approval. Finally, approaches which involve methods such as fMRI or EEG, are difficult to use on children, especially on children with autism who may have trouble following instructions appropriately [27], have atypical behaviors [28], or have excessive head movements [29, 30]. There are studies that used vocalization-based analysis to screen children with autism. For instance, Brisson et al. [31] showed differences in voice features between children with ASD and Typically Developing (TD) children. Several studies, like [32], used speech-related features for the screening of children older than 2. To reach the goal of early ASD screening, vocalizations of infants under 2 years of age have been investigated [33-35]. Santos et al. [33] used vocalizations, such as babbling, to screen ASD children at the age of 18 months. They collected data from 23 and 20 ASD and TD children, respectively. They reported high accuracy of around 97% which can be due to the fact that they used k-fold cross-validation without considering subject-wise hold out in order to have unseen subjects in the test fold [36]. Oller et al. [34] proposed another vocalization-based classification method in which they included age and excluded crying. They applied the method on 106 TD children and 77 children with ASD between 16 to 48 months and reached 86% accuracy. Pokorny et al. [35] extracted eGeMAPS parameter set [37], which includes 88 acoustic parameters, in 10 month old children. This set consists of statistics calculated for 25 frequency-related, energy-related, and spectral low-level descriptors. They reached a 75% accuracy on a population of 10 TD children and 10 children with ASD. Esposito, Hiroi, and Scattoni [38] showed that crying is a promising biomarker for the screening of ASD children. Sheinkopf et al. [39] and Orlandi et al. [40] have shown that there are differences in the cry of children with ASD compared to TD children. To the best of our knowledge, our own group’s preliminary study [41] was the only research that has used cry sounds for the screening of children with ASD. We used a dataset of 5 children with ASD and 4 TD children older than two years. The accuracy of the proposed method is 96.17% using k-fold cross validation without considering subject-wise hold out, which is a shortcoming of this study. In other words, it has been overfitted to the available data and may fail to correctly classify new samples. So, a thorough examination using an unseen test set on cry features is necessary to evaluate the results. It should be noted that the data from our previous study [41] could not be used in the study presented in this paper due to the differences in data collection procedures. In all the above studies, it was assumed that the specific sound features, distinguishing children with ASD from TD children, are common among all the ASD cases. However, this may not be the case for all the features. For instance, tiptoe walking, which is one of the repetitive behaviors of children with ASD, appears in approximately 25% of these children [42]. Consequently, in the current study, we propose a new cry-based approach for screening children with ASD. Our screening approach makes use of the assumption that all discriminative characteristics of autism may not appear in all ASD children. This assumption is in contrast with the assumption put forward in the ordinary instance-based machine learning methods, which assumes that all instances of a class include all discriminative features needed for classification. In our proposed method, at first, discriminable instances of cries, which exist in subsets of children with ASD, are found. Then it uses these instances to select features to distinguish between these ASD instances from TD instances. It should be mentioned that the final selected features, in this study, are common among our set of children with ASD between 18 to 53 months of age. These selected features support the experiential knowledge of our experts stating that the variations in the cries of children with ASD are more than TD children. This approach is different from the other approaches that either used a dataset of children with a specific age [33, 35] or used age information for classification [34]. The proposed approach has been implemented and tested on 62 participants. The results show the effectiveness of the approach with respect to accuracy, sensitivity, and specificity.

Method

Since this study was performed on human subjects, first, it was approved by the ethics committee at Shahid Beheshti University of Medical Sciences and Health Services. All the parents of the participants were informed about the study and signed an agreement form before being included in the study.

Participants

There were 62 participants aged between 18 and 53 months, who were divided into two groups, i.e. 31 ASD and 31 TD with 24 boys and 7 girls in each group. Since we expected to have different vocalization characteristics for boys and girls, the training set was assembled of only boys, including 10 TD, and 10 ASD. In other words, we wanted to eliminate the gender effects on the feature extraction and model training. Unfortunately, due to the lower number of girls with ASD in the real world, not enough data for girls with ASD could be collected. Nonetheless, the model was also tested on the girls to see how it would generalize even on them. The inclusion criteria of the ASD participants were: a) being very recently diagnosed with ASD based on DSM-5 with no other neurodevelopmental, mental, and intellectual disorder, b) having no other known medical or genetic conditions, or environmental factors, and c) not having received any treatment or medication, or having received treatment in less than a month. There were only two girls who did not fall into these criteria since they had been diagnosed more than a year before. The participants’ average language development at the time of participation, which was assessed based on [43-46], was equal to children between 6 to 12 months old. The autism diagnosis procedure started with the Gilliam Autism Rating Scale-Second Edition (GARS-2) questionnaire [47] which was answered by the parents. Then the parents were interviewed, based on DSM-5, while the participants were evaluated and observed by two child clinical psychologists with Ph.D. degrees. In addition, the diagnosis of ASD was separately confirmed by at least one child psychiatrist in a different setting. It should be noted that ADOS, which is a very common diagnostic tool is not administered widely in Iran since there is no official translation of ADOS in Farsi. TD children were selected from those in an age range similar to the ASD participants from volunteer families from their homes and health centers. They had no evidence or official diagnosis of any neurological or psychological disorder at the time of recording their voices. The children with ASD were older than 20 months with the mean, standard deviation, and range of 35.6, 8.8, and 33 months respectively. The TD children were younger than 51 months with the mean, standard deviation, and range of about 30.8, 10.3, and 33 months respectively. It should be mentioned that the diagnosis of the children under 3 years was mainly based on experts’ evaluation, not the GARS score. Furthermore, all TD participants under 3 years of age had a follow up study when they passed the age of 3, to make sure the initial TD assignment was correct or still valid. To do so, we used a set of expert-selected questions based on [48] to assess them through interviews with parents. Tables 1 and 2 show the details of the participants on the training and test sets, respectively. In each table, the number of voice instances from each participant and the total duration of all its instances in seconds are shown in columns 3 and 4, respectively. The recording device category, i.e. a high-quality recorder (HQR) and typical cell phones (CP), is given in the device category column. The next two columns include GARS-2 scores and the language developmental milestone of the participants with ASD at the time of the recording. In six cases, there were no GARS score available at the time of study, demonstrated by ND (No Data). The column labeled as ‘Place’ shows the location of the recording which can be in homes (H), autism centers (C1, C2, and C3), and health centers (C4, C5, and C6). There was a total number of 359 samples for all children. 53.44% of the samples were from ASD participants and 46.56% were from TD participants.

Table 1

The training set data of participants.

	ID	Age (month)	# of instances	Total duration(sec)	Device	GARS score	Language milestone (month)	Place	Reason for crying
ASD	ASD1	20	9	7.8	CP	104	0–6	C1	Annoyed/Uncomfortable
	ASD2	24	3	1.5	HQR	83	0–6	C2	Unwilling
	ASD3	26	5	2.1	HQR	120	0–6	C1	Annoyed/Uncomfortable
	ASD4	28	13	9.1	HQR	121	0–6	C2	Annoyed/Uncomfortable
	ASD5	29	14	26	HQR	89	6–12	C2	Unwilling/Complaining
	ASD6	31	4	2.4	HQR	87	0–6	C2	Unwilling/Complaining
	ASD7	36	11	11	HQR	87	6–12	C2	Unwilling/Complaining
	ASD8	43	2	0.7	CP	ND	ND	C2	Unwilling
	ASD9	45	3	2.6	CP	72	6–12	C2	Complaining
	ASD10	45	4	3.4	CP	ND	ND	H	Sleepy
TD	TD1	21	11	14	HQR	NA	NA	H	Complaining
	TD2	24	12	12	HQR	NA	NA	C4	Scared/Unwilling
	TD3	26	2	2.3	HQR	NA	NA	C5	Unwilling
	TD4	28	6	13	CP	NA	NA	C5	Scared/Unwilling
	TD5	36	3	2.6	CP	NA	NA	H	Unwilling/Complaining
	TD6	38	3	1.5	HQR	NA	NA	C6	Complaining
	TD7	41	3	2.4	HQR	NA	NA	H	Unwilling
	TD8	43	3	2.2	CP	NA	NA	H	Sleepy
	TD9	44	2	1.2	CP	NA	NA	H	Complaining
	TD10	51	2	1.7	CP	NA	NA	H	Complaining

Table 2

The test set information.

		ID	Age (month)	# of instances	Total duration(S)	Device	GARS score	Language milestone (months)	Place	Reason for crying
ASD	Boys	ASD11	28	12	7.2	HQR	102	0–6	C2	Unwilling/ Uncomfortable
		ASD12	30	18	17.1	HQR	ND	ND	C3	Separation from mother
		ASD13	30	3	2.9	CP	ND	ND	H	Unwilling/Sleepy
		ASD14	31	5	2.3	HQR	73	0–6	C2/ H	Separation from mother/Hungriness
		ASD15	33	3	2.5	HQR	91	0–6	C2	Unwilling
		ASD16	33	2	2.5	HQR	104	0–6	C1	Annoyed/Uncomfortable
		ASD17	34	1	0.6	HQR	91	0–6	C2	Unwilling/Complaining
		ASD18	35	2	1.7	HQR	81	ND	C1	Annoyed/Uncomfortable
		ASD19	37	1	0.6	HQR	94	12–18	C2	Unwilling/Complaining
		ASD20	40	19	14	HQR	91	0–6	C1	Annoyed
		ASD21	45	1	0.3	HQR	81	6–12	C2	Unwilling/Complaining
		ASD22	48	2	1.6	HQR	100	6–12	C2	Annoyed/Complaining
		ASD23	52	6	3.1	HQR	113	12–18	C2	Unwilling/Complaining
		ASD24	53	7	5.2	HQR	78	6–12	C1	Annoyed/Uncomfortable
	Girls	ASD25	25	12	14	HQR	85	0–6	C2	Unwilling/Complaining
		ASD26	26	5	2	CP	102	0–6	C1	Scared
		ASD27	31	3	1.7	HQR	94	0–6	C2	Unwilling/Complaining
		ASD28	32	2	1.3	HQR	100	0–6	C2	Unwilling/Complaining
		ASD29	41	8	3	HQR	102	0–6	C2	Unwilling/Complaining
		ASD30	45	2	1.2	CP	ND	ND	H	Thirsty
		ASD31	49	7	12	CP	ND	ND	H	Unwilling/Complaining
TD	Boys	TD11	18	4	2	HQR	NA	NA	C4	Scared
		TD12	18	7	5.1	HQR	NA	NA	C4	Scared/Unwilling
		TD13	19	7	4.2	HQR	NA	NA	C5	Unwilling
		TD14	20	9	8	HQR	NA	NA	C5	Unwilling/Complaining
		TD15	21	4	1.2	HQR	NA	NA	H	Complaining
		TD16	24	3	2.7	HQR	NA	NA	C5	Scared /Unwilling
		TD17	24	2	1.5	HQR	NA	NA	C5	Scared/Unwilling
		TD18	24	6	5.1	HQR	NA	NA	C4	Unwilling/Complaining
		TD19	24	4	2.4	HQR	NA	NA	C5	Unwilling/Complaining
		TD20	24	5	4.2	HQR	NA	NA	C5	Unwilling/Complaining
		TD21	29	11	10	HQR	NA	NA	H	Unwilling/Complaining
		TD22	30	4	2	HQR	NA	NA	C5	Scared/Unwilling
		TD23	30	4	2	CP	NA	NA	H	Unwilling
		TD24	43	12	11	HQR	NA	NA	H	Complaining
	Girls	TD25	24	5	6	HQR	NA	NA	C4	Unwilling/Complaining
		TD26	25	2	4.4	HQR	NA	NA	C5	Scared
		TD27	29	5	5	HQR	NA	NA	C5	Scared
		TD28	33	2	2.1	CP	NA	NA	H	Complaining
		TD29	45	16	11	HQR	NA	NA	H	Unwilling/Complaining
		TD30	50	6	7	HQR	NA	NA	H	Complaining
		TD31	51	2	0.7	CP	NA	NA	H	Unwilling

Two groups of 10 TD and 10 ASD children were selected for training the classifiers such that two groups were as balanced as possible with respect to age and the recording device. Thus, each child in the TD group had a corresponding child in the ASD group around the same age. As a result of this data balancing, we obtained training participants with an age between 20 and 51 months. The mean ages in the training set were 32.7 and 35.2 months for ASD and TD participants, respectively. The standard deviations are 9 and 9.9 months with the range of 25 and 30 months for ASD and TD participants, respectively. Although this approach was trained and tested on children older than 18 months, we tested the proposed approach on 57 participants between 10 to 18 months to investigate how it works on children under 18 months. These 57 participants consisted of 28 boys and 29 girls with the mean age of 15.2 for both and standard deviations of 2.8 and 2.9 respectively. All these participants were evaluated at a later date at the age of 3 or older, by the same follow-up procedure, using our expert-selected questionnaire. At the time of initial voice collection, 55 of these participants had no evident or diagnosed disorder. Two of them were referred to our experts due to the positive results of screening using our method. The diagnosis or concerns about the two mentioned participants, as well as the participants with any evidence of having abnormality in the developmental milestones during the follow-up procedure are summarized in Table 3. The summary of disorders given in the last column of Table 3 is based on the parental interviews and our experts’ evaluation. Unfortunately, Child5, Child6, and Child7’s parents did not cooperate in obtaining expert evaluation.

Table 3

The participants with an abnormality in the follow-up.

ID	Gender	Age (in months)		Disorder
ID	Gender	at recording time	at following-up time	Disorder
Child1	M	11	11	Developmental delay^a, signs of genetic diseases
Child2	M	17	17	UNDD^b
Child3	M	12	40	ASD^b
Child4	M	12	36	Sensory processing disorder^c, several ADHD symptoms^b
Child5	M	18	40	Language delay
Child6	M	15	46	Developmental delay symptoms
Child7	M	12	43	Developmental delay symptoms

UNDD, Unspecified Neurodevelopmental Disorder.

a Clinical observation by our expert based on [48].

b Clinical observation by our expert based on [1]

c Clinical observation by our expert based on [49].

UNDD, Unspecified Neurodevelopmental Disorder. a Clinical observation by our expert based on [48]. b Clinical observation by our expert based on [1] c Clinical observation by our expert based on [49].

Data collection and preprocessing

As mentioned earlier, the data was recorded using high-quality devices and typical smartphones. The high-quality devices were a UX560 Sony voice recorder and a Sony UX512F voice recorder. To use typical smartphones, a voice-recording and archiving application was developed and used on various types of smartphones. All voices, through the application or the high-quality recorders, were recorded in wav format, 16 bits, and with the sampling rate of 44.1 kHz. The reason for using various devices was to avoid biasing of the approach to a specific device. Similarly, the place of recording was not restricted to one place in order to make the results applicable to all places. The parents and trained voice collectors were asked to record the voices in a quiet environment. Furthermore, they were asked to keep the recorders or smartphones about 25 cm from the participants’ mouth. Despite the proposed two recommendations, there were recorded voices where the recommendations were not followed and did not have the required quality. Consequently, those recordings were eliminated from the study. Also, all the cry sounds which were due to pain, had been removed from the study since they were similar between the TD and ASD groups. After data collection, there was a preprocessing phase in which only pure crying parts of the recordings, with no other types of vocalization, were selected. To explain more, the parts of cry sounds which were accompanied by screaming, saying words/other vocalizations, or that occurred with closed/non-empty mouth were eliminated. All segmentations and eliminations were done manually using Sound Forge Pro 11.0. From the selected cries, the beginning and the end, which contained voice rises and fades, were removed in order to just keep the steady parts of the cries; this prevents having too much variation in the voice which can lead to unsuitable statistics. Also, the uvular/guttural parts of the cries were removed. The reason for this was that we believe these parts distort the feature values of the steady parts of a voice. Each remaining continuous segment of the cries was considered and used as a sample (instance) in this study. Finally, since the basic voice features were extracted from 20 milliseconds frames [50], to generate statistical features of the basic features, the minimum length of the cry segments were set to 15 frames, i.e. 300 milliseconds. Thus, any cry samples below 300 milliseconds were eliminated from the study. In this study, the final prepared samples were between 320 milliseconds to 3 seconds.

Feature extraction

Previous studies working on voice features for discriminating ASD children used different sets of features. These methods share several common features like F0, i.e. the fundamental frequency of a voice, and Mel-Frequency Cepstral Coefficients (MFCC), i.e. coefficients which represent the short-term power spectrum of a sound [51]. F0 has been one of the most common features used [31, 32, 39]. However, since age is an important factor affecting F0 [52], this feature is useful when participants have a similar age. On the other hand, MFCC coefficients and several related statistical values have been reported to be useful features in several studies [35, 41, 53]. Considering the useful features reported in previous studies and the specifications of the current study, several features were selected to be used in this work that are explained in the following. In this study, each instance was divided into 20 milliseconds frames, to extract basic voice features. We used several features proposed by Motlagh, Moradi, and Pouretemad [41] and by Belalcázar-Bolaños et al. [54]. The features used by Motlagh, Moradi, and Pouretemad [41] include certain statistics like mean and covariance of the frame-wise basic features, such as MFCC coefficients, over a voice segment. They also used the mean and variance of frame-wise temporal derivative [55, 56] of the basic features. The frame-wise temporal derivative means the difference between two consecutive frames, which in a sense is the rate of change of a feature value in one frame step. We modified the spectral flatness features by including the range of 125–250 Hz beside the 250–500 Hz range. This range was added to cover a wider frequency range than the normal children frequency range, which showed to be necessary in the process of feature extraction and selection. Each range is divided into 4 octaves and the spectral flatness is computed for those octaves. We removed all uninformative and noisy features of the set which are explained in the following. The mean of frame-wise temporal derivative of the basic features is removed because it is not a meaningful feature and is equal to taking the difference between the value of the last and the first frames. There are means of the features related to the energy, such as the audio power, total loudness, SONE, and the first coefficient of MFCC, that were removed to make the classifier independent of the loudness/power in children’s voices. Zero crossing rate (ZCR) was omitted too, due to its dependency on the noise in the environment. The second set of features used in this study was from Belalcázar-Bolaños et al. [54] because it has phonation features, like jitter and shimmer. Jitter and shimmer, which have been reported to be discriminative for ASD, are linked to perceptions of breathiness, hoarseness, and roughness [57]. Other features used from Belalcázar-Bolaños et al. [54] include glottal features related to vocal quality and the closing velocity of the vocal folds [33]. The mean of logarithmic energy feature was omitted for the same reason as other energy-related features. A summary of the features, added to or removed from the sets by [41] and [54], is presented in Table 4.

Table 4

The features and statistics which were added or removed to the two feature sets.

	Feature	removing/adding	Reason
Second set	logarithmic energy	Mean statistic is removed	Classification dependency on loudness/power of cries
First set	Audio power
	Total loudness
	SONE
	First MFCC coefficient
	ZCR	The basic feature is removed	The feature’s dependency on environmental noise
	All basic features applicable	mean of frame-wise temporal derivative of the basic features is removed	No meaning for the feature
	MFCC	Coefficients of 14–24 are added	Having higher-order coefficients for vocal cords information as well as vocal tract
	Spectral flatness	A range of 125–250 Hz is added	Covering the low-frequency range of human voice

The proposed subset instance classifier

To explain the proposed classifier, it was assumed that there is a target group of participants that we want to distinguish from the rest of the participants, called the rest. Furthermore, each participant in the target group may have several instances that may be used to distinguish the target group from the rest. Fig 1A shows a situation in which all instances of all participants of the target group are differentiable using common classifiers that we call Whole Set Instance (WSI) classifiers. In this figure, the circles represent our target group and the triangles represent the rest. The color coding is used to differentiate between the instances of each participant in each group. In contrast to the situation in Fig 1A, in Fig 1B the target group cannot easily be distinguished from the rest. In such a situation, there are instances of two participants in the target group, i.e. the red and brown circles that are not easily separable from the instances in the rest (Case 1). Furthermore, there is a participant with no instances, i.e. the orange circles, easily separable from the rest (Case 2). An example of Case 1 is tiptoe walking in children with ASD, which is common in about 25% of these children [42] who do it most of the time. An example of Case 2 is children with ASD who do not tiptoe walk. In other words, there are children with ASD who cannot be distinguished from TD children using the tiptoe walking behavior factor.

Fig 1

Two different hypothetical types of two-dimensional data of the target group and the rest.

Two different hypothetical types of two-dimensional data of the target group and the rest.

The instances shown by the warm-colored circles and the cool-colored triangles are for the target group and the rest, respectively. All instances belonging to a participant have the same color. In (a), all the target group participants’ instances are distinguishable using a classifier. In (b), only some instances of the target group participants are separable from the other instances by a classifier. Applying any WSI classifier may fail for the data type shown in Fig 1B. Consequently, we proposed SubSet Instance (SSI) classifier that first finds differentiable instances and then trains a classifier on these instances. As an example, the proposed SSI classifier first tries to find the circles on the left of the line in Fig 1B, using a clustering method. Then, it uses these circles, as exclusive instances having a specific feature common in a subset of the target group, to train a classifier separating a subset of the target group. The steps of common WSI classifiers are shown in Fig 2A. The steps of our proposed SSI classifier are shown in Fig 2B. In the SSI classification approach, after the feature extraction and clustering steps, for each cluster, a classifier is trained to separate its exclusive instances from the instances of the rest of the participants. In the testing phase, any participant with only one instance classified in the target group (positive instance), is classified as a target group’s participant. The pseudo-code for the proposed approach is given in Algorithms 1 and 2.

Fig 2

An overall view of WSI and SSI methods.

(a) In WSI method, after feature extraction, a classifier is trained on all instances and majority pooling (MP) is usually used in the testing phase. In this study Best-chance threshold Pooling (BP), which is a threshold-based pooling with the threshold giving the best accuracy on the test set, is also used to give the best chance to WSI classifier. (b) In the proposed SSI classifier, after feature extraction, clustering is applied to find and select exclusive instances containing instances of the target group participants only. Then classifiers are trained using exclusive instances, and a participant is classified in the target group in the testing phase if any classifier detects a positive instance for it.

An overall view of WSI and SSI methods.

Algorithm 1. Training SSI classifiers

T: set of all target group instances R: set of all the rest instances F: set of all classifiers ρ: threshold for the number of samples in a cluster s: the number of minimum samples needed in a cluster to be able to train a classifier for it C: The jth cluster n: number of clusters F = ∅ 1: While ∃j |Cj| > ρ; while there is a cluster bigger than a threshold or n = 1 2: n = n + 1; increase the number of clusters 3: Cluster the T + R into n clusters C,j = 1,…,n 4: EC = {C ⊂ T}; the set of clusters of only exclusive instances, i.e. exclusive clusters 5: If EC ≠ ∅; check if there is any exclusive cluster 6: For all C in EC with |Cj| > s 7: Train a classifier using positive labels c ϵ C and negative labels r ϵ R 8: Add the classifier to F 9: ; remove the instances of the exclusive clusters from target group instances 10: n = 1; set 1 to re-start clustering in two groups on the remaining instances

Algorithm 2. Testing SSI classifiers

F: set of trained classifiers A: set of subject instances 1: For all instances a of A 2: P = {a ∈ A|∃f, classifies a as positive instance} 3: If P ≠ ∅ 4: The participant is from the target group 5: Else 6: The participant is from the rest In the proposed training algorithm of the SSI approach, the goal is to find clusters containing the ASD instances only. Then a classifier is trained using the instances of these clusters and added to a list of all trained classifiers (lines 7 and 8 of Algorithm 1). As shown in the loop of the algorithm, starting at line 1, the data is clustered starting with two clusters. Then the number of clusters is increased until a cluster, containing only the target group instances, emerges. The exclusive instances in such a cluster are removed from the set of all target group’s instances, and the loop is restarted. Before restarting the loop, if the number of instances in this cluster is more than a threshold, a new classifier using these instances is trained and this classifier is added to the set of all trained classifiers. The loop stops when the number of samples in each cluster is less than a threshold. For testing the participants, using the trained classifiers, all the instances of each participant are classified one by one using all the trained classifiers (line 2 of Algorithm 2). A subject would be classified in the target group if at least one of its instances is classified in the target group at least by one of the classifiers (lines 3 and 4 of Algorithm 2). Otherwise, if there is no instance classified among the target group, the participant is classified as the rest (lines 5 and 6).

Details of the implementations

The classifiers were implemented in Python using scikit-learn library.

WSI classifiers

We have tested several common WSI classifiers, but we report only the result of SVM with RBF kernel and with no feature selection, which gives the best average accuracy. It should be noted that several feature selection approaches, like L1-SVM and backward elimination, were tested but they only reduced the accuracy. We used group 5-fold cross-validation for tuning hyper-parameters. Group K-fold means that all instances of each participant are placed in only one of the folds. This prevents having the same participant’s instances in the train and validation folds simultaneously. In each fold, there were two ASD and two TD participants. It should be mentioned that before applying the algorithms, we balanced the number of instances of the two groups using upsampling. Two approaches were exploited to combine the decisions on different samples of a participant in the WSI approach. The first approach was majority pooling which classifies a participant as ASD, if the number of instances classified as ASD are more than 50 percent of all instances. The second approach was threshold-based pooling which is similar to the first approach except that a threshold other than 50 is used.

SSI classifiers

Before applying the algorithm, we balanced the number of instances of the two groups by upsampling. The threshold for the minimum number of samples, needed in a cluster, to be able to train a classifier is set to 10. It should be mentioned that agglomerative clustering and decision tree are the methods used for clustering and classification parts of Algorithm 1, respectively.

Training the SSI classifiers

After running Algorithm 1 on our data, two exclusive clusters with enough instances, i.e. at least 10 instances in our study, were found. Then two classifiers were trained corresponding to each cluster. One of these exclusive clusters had 11 instances from 4 ASD participants (Table 1). These 11 instances consisted of 6 out of 9 instances of ASD1, 2 out of 4 instances of ASD10, 1 out of 2 instances of ASD8, and 2 out of 4 instances of ASD6. As explained in the algorithm, for each cluster, a decision tree classifier was trained using the ASD instances in the cluster versus all TD instances. Interestingly, only one feature was enough to discriminate instances in the cluster from all TD instances. Among those features that can discriminate the cluster’s instances, we selected the Variance of Frame-wise Temporal Derivative (VFTD) of the 7th MFCC coefficient as the feature which can discriminate more ASD participants from the set of all participants with a simple threshold. The classifier obtained by setting a threshold based on this feature was the first classifier. This feature supports our expert’s report regarding the higher variations in the cry sounds of ASD children than TD children. From 10 ASD children, 8 of them can be discriminated using this feature. For each participant, the number of instances found by this classifier is shown in the 2nd column of Table 5.

Table 5

The number of instances of each participant in the training set that are classified as ASD using each trained SSI classifier.

ID	First SSI classifier	Second SSI classifier
ASD1	8	3
ASD2	1	2
ASD3	3	1
ASD4	10	9
ASD5	0	0
ASD6	1	3
ASD7	1	0
ASD8	1	2
ASD9	0	1
ASD10	2	4

After excluding the ASD samples from the first classifier, the second classifier was trained based on the second exclusive cluster. This cluster included all instances of participant ASD4. The only feature used for classifying this cluster was VFTD of the 6th SONE coefficient. SONE is a unit of loudness which is a subjective perception of sound pressure [58]. Having higher VFTD of the 6th SONE coefficient confirms the experiential knowledge of our experts mentioned before. Among all the ASD participants, eight had instances with VFTD of the 6th SONE higher than a threshold (Shown in the 3rd column of Table 5). The results of classification based on these two features are depicted in Fig 3. As mentioned in the proposed method section, the participants with at least one instance classified into this cluster would be considered as a participant with ASD.

Fig 3

Two classifiers trained on the two exclusive clusters found during the SSI classifier training phase.

(a) The Variance of Frame-wise Temporal Derivative (VFTD) of the 7th MFCC coefficient separates 27 instances of 8 ASD subjects from all TD instances of the training set. (b) VFTD of the 6th SONE coefficient separates 17 instances of 7 ASD participants from all TD instances of the training set.

Two classifiers trained on the two exclusive clusters found during the SSI classifier training phase.

Results

In this part, the performance of our proposed SSI classifier against a common WSI classifier is evaluated on our test set of ASD and TD participants. Each participant has multiple instances which are cleaned using the criteria explained in the data collection and preprocessing section. The participants who had at least one accepted instance were used in the training and testing phases, which are shown in Tables 1 and 2. The output of the SSI approach was two classifiers, each of them works by setting a threshold based on a feature. The number of instances of ASD participants in the training set, correctly detected by the first and the second classifiers, are shown in the second and third columns of Table 5, respectively. On the other hand, the best-resulting classifier for the WSI approach was Radial Basis Function-Support Vector Machine (RBF-SVM) [59]. The classification results on the test set for different classifiers are shown in Table 6. The portion of each participant’s instances, correctly classified by each classifier, is written as a percentage under the name of the classifier. The decision made by the WSI and SSI classifiers for each participant is shown by ASD or TD. To classify each subject using the WSI classifier, the Majority Pooling (MP) and the Best-chance threshold Pooling (BP) approaches were used. BP is a threshold-based pooling with the threshold giving the best accuracy on the test set for male participants. For the boys, MP has specificity, sensitivity, and precision equal to 100%, 35.71%, and 67.85%, respectively. On the other hand, BP leads to specificity, sensitivity, and precision equal to 85.71%, 71.42%, and 78.57%, respectively. The threshold for BP was set to 20% that means if 20% of instances of a participant were classified as ASD instance, the participant was classified as having ASD. The results of the percentage of instances correctly classified by the two classifiers in the SSI approach are shown as C1 (the first SSI classifier) and C2 (the second SSI classifier) in Table 6. The aggregated result of the decisions by C1 and C2 makes the final decision of the SSI classifier which is shown in the decision column, under the SSI classification section. The achieved specificity, sensitivity, and precision using the proposed method for the boys are 100%, 85.71%, and 92.85%, respectively.

Table 6

The results of classifiers on the instances of each participant in the test set.

	TD children							Children with ASD
	ID	Portion of instances classified as TD as a percentage and the decision						ID	Portion of instances classified as ASD as a percentage and the decision
		WSI classification			SSI classification				WSI classification			SSI classification
		SVM	Dec.		C₁	C₂	Dec.		SVM	Dec.		C₁	C₂	Dec.
		SVM	MP	BP	C₁	C₂	Dec.		SVM	MP	BP	C₁	C₂	Dec.
Boys	TD11	100	TD	TD	100	100	TD	ASD11	50	ASD	ASD	17	50	ASD
	TD12	100	TD	TD	100	100	TD	ASD12	33	TD	ASD	11	28	ASD
	TD13	100	TD	TD	100	100	TD	ASD13	33	TD	ASD	33	0	ASD
	TD14	100	TD	TD	100	100	TD	ASD14	20	TD	ASD	20	20	ASD
	TD15	100	TD	TD	100	100	TD	ASD15	0	TD	TD	0	40	ASD
	TD16	100	TD	TD	100	100	TD	ASD16	50	ASD	ASD	100	0	ASD
	TD17	100	TD	TD	100	100	TD	ASD17	0	TD	TD	0	100	ASD
	TD18	83	TD	TD	100	100	TD	ASD18	50	ASD	ASD	50	50	ASD
	TD19	100	TD	TD	100	100	TD	ASD19	0	TD	TD	0	0	TD
	TD20	80	TD	ASD	100	100	TD	ASD20	42	TD	ASD	42	16	ASD
	TD21	100	TD	TD	100	100	TD	ASD21	100	ASD	ASD	0	0	TD
	TD22	100	TD	TD	100	100	TD	ASD22	0	TD	TD	0	50	ASD
	TD23	75	TD	ASD	100	100	TD	ASD23	33	TD	ASD	33	17	ASD
	TD24	92	TD	TD	100	100	TD	ASD24	86	ASD	ASD	86	86	ASD
	Acc. %		100	85.71			100			35.71	71.42			85.71
Girls	TD25	100	TD	TD	100	100	TD	ASD25	42	TD	ASD	17	0	ASD
	TD26	100	TD	TD	100	100	TD	ASD26	60	ASD	ASD	60	20	ASD
	TD27	100	TD	TD	100	100	TD	ASD27	50	ASD	ASD	0	0	TD
	TD28	100	TD	TD	100	100	TD	ASD28	100	ASD	ASD	0	50	ASD
	TD29	100	TD	TD	100	100	TD	ASD29	62	ASD	ASD	50	50	ASD
	TD30	67	TD	ASD	100	100	TD	ASD30	100	ASD	ASD	50	50	ASD
	TD31	100	TD	TD	100	100	TD	ASD31	0	TD	TD	0	0	TD
	Acc. %		100	85.71			100			71.42	85.71			71.42

Each classifier result on a participant’s instances is reported as a percentage.

Dec., Decision; MP, Majority Pooling; BC, Best-chance threshold Pooling; C1, Classifier1; C2, Classifier2; Acc., Accuracy.

Each classifier result on a participant’s instances is reported as a percentage. Dec., Decision; MP, Majority Pooling; BC, Best-chance threshold Pooling; C1, Classifier1; C2, Classifier2; Acc., Accuracy. To further show the applicability of the proposed approach to girls, we applied the boys’ trained classifiers on the test set of the girls. The results are shown in the last row of Table 6 which show that the MP approach has specificity, sensitivity, and precision equal to 100%, 71.42%, and 85.71%, respectively. Furthermore, the BP approach gives specificity, sensitivity, and precision all equal to 85.71%, respectively. The results of the proposed SSI classifier is 100% specificity, 71.42% sensitivity, and 85.71% precision. A two-dimensional scatter plot of the two features, used in C1 and C2 classifiers, are shown in Fig 4. As can be seen in this figure, the instances of a participant with ASD are scattered in the area containing instances of both TD and ASD participants. Nevertheless, there are instances for this participant uniquely distinguishable using the selected two features.

Fig 4

Instances of several ASD and TD participants scattered in the space of two features given by the proposed SSI method.

Instances of several ASD and TD participants scattered in the space of two features given by the proposed SSI method.

The instances of a chosen ASD participant are illustrated in green to show that a participant may have instances in the area common with TD instances besides those two areas separated by the selected thresholds as ASD. The mentioned ASD participant (with green instances) is tagged as ASD, due to having at least one instance with the greater value than at least one of the thresholds on the two features. We compared the results of our proposed method with that of the only method available in the literature which was trained using only cry features [41] based on our data. The results (Table 7) show the superiority of our method, compared to the previously proposed method.

Table 7

Comparison of the results on the test set using the two methods; SSI approach and a baseline approach.

		Sensitivity	Specificity	Precision
Boys	SSI	85.71%	100%	92.85%
Boys	Baseline	50.58%	81%	65%
Girls	SSI	71.42%	100%	85.71%
Girls	Baseline	21%	86.48%	53%

Investigating the trained classifier on participants under 18 months

The SSI classifier which was trained using the training set in Table 1 was also tested on the data of children younger than 18 months. From 57 participants under 18 months, two boys (Child1 and Child2 in Table 3) were classified as ASD by the mentioned trained classifier. These participants were referred to our experts for diagnosis. These two were suspected of having neurodevelopmental problems. All other boys were classified as TD. However, among them, Child3 was diagnosed with ASD at the age of 2. Also, Child4 showed symptoms of having ADHD and sensory processing disorder at the age of 3. Three other children had symptoms which suggested that they are not TD children. Two of the girls who were 18 months old were classified as ASD, using the trained classifier. The other girls were classified as TD. The results of testing the trained SSI classifier on this data set are summarized in Table 8.

Table 8

Classification of the participants under 18 months using our trained SSI classifier.

	Boys			Girls
	ASD	TD	Others^a	ASD	TD	Others^a
Classified as ASD	0	0	2	0	1	0
Classified as TD	1	22	4	0	27	0

a Other developmental or mental disorders

a Other developmental or mental disorders The original and cleaned voices and their extracted features (the data set) in this research and the implementation codes of the proposed method are deposited in the following repositories: CodeOcean 10.24433/CO.0622770.v1 Harvard Dataverse (Contains only a rar file of sounds): 10.7910/DVN/LSTBQW

Discussion and conclusion

In this paper, we presented a novel cry-based screening method to distinguish between children with autism and typically developing children. In the proposed method, groups of children with autism who have specific features in their cry sounds can be determined. This method is based on a new classification approach called SubSet Instance (SSI) classifier. An appealing property of the proposed SSI classifier, in the case of voice-based autism screening, is its high specificity such that a normal child can be detected with no error. We applied the proposed method on a group of participants consisting of 24 boys with ASD between 20 and 53 months of age and 24 TD boys between 18 and 51 months of age. The two features, found in this study, were used to train a classifier on 10 boys with ASD and 10 TD boys. Then, the classifier was used to distinguish 14 boys with ASD from 14 TD boys, reaching 92.8% accuracy. Due to the fact that girls are less likely to have autism and consequently, it is harder to collect enough data from girls than boys, the number of girls with ASD was not sufficient to train a separate classifier for this gender. It should be noted that we tested the trained system on 7 girls with ASD and 7 TD girls. It was seen that the trained classifier can screen girls with 7% lower accuracy than boys of the test set. In other words, it seems that gender differences should be considered in the training of the system. In testing the data from participants under 18 months, one TD girl was classified as ASD which was not the case for any TD children of the male counterparts. This result also confirms the aforementioned point about the gender effect. However, in future work, we would try to collect more data on girls to be able to train a system to accurately screen girls. Furthermore, we would also try to train a single classifier for boys and girls to determine whether it can be used for both of them. It should be mentioned that our training and test data were completely separate, to make the trained model more general. The features found in this study are applicable in the age range of our participants from 18 to 53 months. This is in contrast to other approaches that either used a dataset of children with a specific age [33, 35] or used age information for classification [34]. Due to the age invariant features found in this study, it can be claimed that there are markers in the voices of children with ASD that are sustained at least in a range of ages. The two discriminative features, found in this study, were a coefficient of MFCC and a SONE coefficient. MFCC and SONE are related to the power spectrum of a speech signal. SONE measures loudness in specific Bark bands [56]. On the other hand, MFCC, which is the inverse DFT of log-spectrum in the Mel scale, is related to the timbre of the voice [60]. Therefore, MFCC and SONE can be interpreted to be related to the timbre and loudness of a tone. Furthermore, based on the feedback from our experts, there is unpredictability in the crying sound of children with autism which is not the case for TD children. Consequently, we used the variance of temporal difference as a feature suitable for screening children with autism. This is due to the fact that if a signal is constant or changes linearly over time, the variance of temporal difference is zero. Therefore, the variance of temporal difference can be seen as the amount of ambiguity or unpredictability of a sound. On the other hand, the heightened variability in the two features, found in this study, for children with ASD is significant due to the reports from other studies [22, 61] which shows increased biological signals variability in children with ASD and infants at high risk for autism in comparison with TD children. These features are statistical features of the cry instances that hold constant, at least, across an age range studied in this research. To the best of our knowledge, [34] and [35] were the only studies on screening children with autism using voice features on children younger than 2 years of age. Our proposed method has higher precision than these two, i.e. 6% more than [34] and 17% more than [35], using only cry features. The use of cry features as suitable biomarkers for autism screening matches the claims in [38]. In the present study only children with ASD and TD children were tested. Other developmental disorders or health issues were not tested to see how children with such disorders would be classified using the proposed method which can decrease the specificity of 100%. However, this approach is proposed to be used as a screening tool and the final diagnosis should be done under experts’ supervision. So, this approach can be applied as a general screener of autism spectrum disorder. The trained classifier was also tested on 57 participants between 10 to 18 months of age. The classifier screened two boys from the rest, i.e. Child1 and Child2 (Table 3). Child1 showed evidences of genetic disease and was diagnosed with developmental delay and Child2 received UNDD classification by our experts. This suggests that a) the system can be used for children under 2 years of age, and b) it may be able to distinguish other neurodevelopmental disorders. On the other hand, there were 5 boys, i.e. Child3 to Child7 (Table 3), who had no evidence of mental or developmental disorders at the time of their recording. At the same time, our approach did not distinguish them as children with ASD either. However, when they were older than 3 years, they showed symptoms of neurodevelopmental disorders. Out of these children, we could manage to collect new recordings from Child3 and Child4 that were classified as children with ASD using our approach. Unfortunately, Child5, Child6, and Child7 did not cooperate and could not be evaluated by an expert to validate the results of our expert-selected questionnaire. Furthermore, the parents refused to cooperate send us their children’s recent cry sounds. The result of studying these 57 children under the age of 18 months may suggest that: a) there could be symptoms in the crying sounds of children with neurodevelopmental disorders under 18 months (Child1 and Child2), b) the approach may not be able to screen a participant with neurodevelopmental disorders under the age of 18 months due to the possibility that: 1) the participant was among those children with neurodevelopmental disorders who do not have our proposed specific features in their crying sounds, 2) the participant’s recorded cry samples did not include our specific features, and/or 3) neurodevelopmental disorders and their features had not been developed in the child at the time of initial recording. The reason behind not classifying Child3 and Child4, as children with ASD under the age of 18, could be b.2 or b.3. To clearly determine any reason behind this phenomena, a further investigation is needed. We believe that this approach can be used to perform early autism screening under 18 months of age. Thus, in the future, we need to collect data and test the approach on more data of children under 18 months to validate these results with more confidence. We have to further check the proposed approach and the extracted features on other neurodevelopmental disorders, such as ADHD, to evaluate the capability of the approach to distinguish the children with these disorders from TD children. Furthermore, without comparing the cry sounds of children with ASD to those without ASD but another disorder, we do not really know if these findings are specific to autism or to general atypical brain developments. Thus, we should collect cry sounds of children with other neurodevelopmental disorders and compare voices of children with ASD to voices of children with other neurodevelopmental disorders to see if these features would be able to separate them or not. It has been demonstrated that crying consists of intricate motor activities [62]. On the other hand, it has been shown that children with ASD have problems in the motor domain and in coordination of their motor capabilities with other modalities [63]. Consequently, it is possible that the extracted features in the crying sounds of children with ASD come from this deficiency/problem in the motor domain which requires further investigations. Finally, automating the preprocessing part is a technical issue that should be addressed if it is deemed necessary that the cry-based screening be fully automated. This is important since such a screening system can be deployed in systems such as Amazon Alexa [64] to automatically screen problematic cry sounds. 16 Apr 2020 PONE-D-19-32813 Early screening of autism spectrum disorder using cry features PLOS ONE Dear Professor Moradi, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. We would appreciate receiving your revised manuscript by May 31 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. We look forward to receiving your revised manuscript. Kind regards, Zhishun Wang, Ph.D. Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements: 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.plosone.org/attachments/PLOSOne_formatting_sample_main_body.pdf and http://www.plosone.org/attachments/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. We suggest you thoroughly copyedit your manuscript for language usage, spelling, and grammar. If you do not know anyone who can help you do this, you may wish to consider employing a professional scientific editing service. Whilst you may use any professional scientific editing service of your choice, PLOS has partnered with both American Journal Experts (AJE) and Editage to provide discounted services to PLOS authors. Both organizations have experience helping authors meet PLOS guidelines and can provide language editing, translation, manuscript formatting, and figure formatting to ensure your manuscript meets our submission guidelines. To take advantage of our partnership with AJE, visit the AJE website (http://learn.aje.com/plos/) for a 15% discount off AJE services. To take advantage of our partnership with Editage, visit the Editage website (www.editage.com) and enter referral code PLOSEDIT for a 15% discount off Editage services. If the PLOS editorial team finds any language issues in text that either AJE or Editage has edited, the service provider will re-edit the text for free. Upon resubmission, please provide the following: The name of the colleague or the details of the professional service that edited your manuscript A copy of your manuscript showing your changes by either highlighting them or using track changes (uploaded as a *supporting information* file) A clean copy of the edited manuscript (uploaded as the new *manuscript* file) 3. We note from your ethics statement that 'The study has been approved by the ethics committee at Shahid Beheshti University of Medical Sciences and Health Services. All the parents of the subjects were informed about the study and signed an agreement to be included in the study.' Please include this information in the methods section of the manuscript. 4. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide. 5. Please amend either the abstract on the online submission form (via Edit Submission) or the abstract in the manuscript so that they are identical. 6. Your ethics statement must appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please move it to the Methods section and delete it from any other section. Please also ensure that your ethics statement is included in your manuscript, as the ethics section of your online submission will not be published alongside your manuscript. 7. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: No Reviewer #2: Partly Reviewer #3: No ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: I Don't Know Reviewer #3: No ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: Yes Reviewer #3: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: No Reviewer #3: No ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The research reported in this article follows a recent trend towards the development of objective and easily implemented biomarkers that can be used to distinguish individuals with vs. without autism and in some cases, identify autism in the first year of life. In this study machine learning and other AI-based tools were applied to children’s cries and were found to have high sensitivity and specificity in distinguishing children with and without autism. The authors are to be applauded for undertaking a problem of such great public health importance. However, there are several limitations of the work that detracts from its overall impact. These are listed below a) Autism is a notoriously heterogenous disorder, and as a result, the sample size used in this work is likely not representative of autism generally. If the algorithms developed by this team are as robust as they purport, then they should be applied to publicly available data on a much larger scale. b) No mention is made of whether the children in this study had IQ or language in the typical range or whether some had an underlying intellectual disability or speech/language problem, both of which could contribute to atypical cries having nothing to do with autism per se c) Similarly, no mention is made of whether any of the children contributing data had an underlying genetic disorder, which again could lead to atypical (and distinct) cries. d) Without comparing the cries of children with ASD to those without ASD but another disorder, we don’t really know if these findings are specific to autism or to atypical brain development generally. e) Little attention is seemingly paid to the context for children’s cries, and this would seem important to understand; thus, were the cries related to distress, to pain, to hunger, to fatigue, etc? f) I was a little confused about the gender difference the authors report. My sense is that the training set only involved male cries but the test set involved both males and females? If that is correct, I don’t know why a female training set was not developed. g) Finally, in several places in the text the authors appear to conflate screening for autism with early identification of autism; these are two very different problems. Reviewer #2: Thank you for the opportunity to review the manuscript “Early screening of autism spectrum disorder using cry features”. This study developed a new method for distinguishing children with autism from typically developing children, based on two different features of their cries. The method involved training a model on cry features from one set of children, and then testing the model on a new set of cries from different children. The results (e.g., high sensitivity and specificity) are impressive, and the implications are intriguing. However, there are several details that are unclear and I have some concerns that might limit the applicability of this research, unless the authors can address them. These major concerns are described below, followed by some minor suggestions that will improve the readability of the manuscript. Major concerns: 1. After reading the paper, I still do not have a clear understanding of the cry features that were useful for screening for ASD. As I understood it, MFCC and SONE are both related to power/amplitude/loudness of the cry, and the features that you used are based on the variance in MFCC and SONE from one temporal window to the next. So is this something that would be perceptible to a human ear? Is it a quality that can be described in terms of vocal quality (e.g. “shrillness”, “hoarseness”, “raspiness”)? Is the quality specific to cries, or would you also expect to see similar characteristics in the vocal quality of children who are talking rather than crying? You do not need to answer each of these specific questions, but if it is possible to describe the results or implications in layman’s terms, I think many readers would appreciate it. On the other hand, if it is something that only a computer can detect with no discernable difference to a human, then it might be informative to say that (e.g., a single sentence in the discussion). 2. If children were crying for different reasons, wouldn’t that affect cry quality? For example, do you know if a hunger cry has the same qualities as a cry of pain? Unless you can verify that children with ASD produce the characteristic cry qualities regardless of the reason for crying, then it would be important to know which type of cry would be most accurate when actually screening a child. 3. If children in the training sample produced different numbers of cries, wouldn’t the children with more cry instances be weighted more heavily in the model than children with fewer cry instances? 4. The children in the TD group were not assessed to verify that they were all truly typically developing. So this group could have included children with ASD that had not yet been diagnosed, or children with some other non-ASD disorder. This is not a fatal flaw, but it should be mentioned as a possible limitation. Specifically, it limits your ability to say whether this method distinguishes children with ASD from all other non-ASD children (i.e., including TD children and children with other disorders) or if it simply distinguishes all non-TD children from TD children. 5. Related to the last point, the children were not followed up later, so you do not know if some of the TD children were diagnosed with ASD (or some other disorder) at a later age. If any of the TD children were diagnosed at a later age, this would decrease the sensitivity. In general, the method seems to be only as sensitive as the person who was providing the evaluations (rather than more sensitive, so it can detect children with ASD before they have obvious symptoms). 6. The current sensitivity & specificity are impressive, but this may be because you included only TD and ASD groups. In real-world settings (e.g., pediatrician offices), this method would need to be able to classify other groups of children too, such as those with health issues (e.g., asthma) or other developmental disorders (e.g., apraxia). It is unclear how these other groups (especially groups with atypical speech, like apraxia of speech or a stutter) would be classified by this method. If these children were classified as “not TD” that would drastically reduce the specificity of this method as an ASD screener, but might make it useful as a general screener that would then lead to a more comprehensive evaluation. 7. The goal is to diagnose children earlier than is currently possible, but the model is trained on children that are approximately 3 years of age (which is around the current average age of diagnosis). It would be important to know how accurate this method would be if used at 12 months, 18 months, or 24 months. You have begun to address this by testing the model on 61 children between 10 and 18 months of age, but the real test of sensitivity would require you to evaluate these 61 children after 3 years of age, so that you could say that the one child that was detected was truly the only one that was not TD. Minor suggestions: Pg. 1 & Pg. 8: The abstract on page 1 is completely different from the abstract on page 8 of the PDF. Line 21: “correspondign” is misspelled Line 27: “feamale” is misspelled Line 34: “ignited” is metaphorical. A literal word might be better, such as “inspired” or “instigated” Line 35: instead of “many researches”, try “much research” Line 38: I would change “about 3 years” to “over 3 years”, because the reported average age of 3.1 years was just for the most severe cases. Less severe cases of ASD (e.g., PDD-NOS and Asperger’s) were diagnosed at much later ages (3.9 and 7.2 years, respectively). Line 53: “features differences” could be changed to “feature differences” Lines 58-59 (also 73-74): This method is unfamiliar to me. I feel confident that I understand what you mean, but it might be useful to include a citation. Lines 94-97: I find this section to be unnecessary. All of this is redundant with the sub-headings below. This might be helpful if the method was very long or complicated, but I think yours is fairly straightforward. Line 103: “set” should be plural, “sets” Line 104: instead of “only a model was trained for screening male subjects”, try “the model was only trained for screening male subjects” Line 112: “psychologist” should be plural, “psychologists” Line 113: “was” should be added, to make it “diagnosis of ASD was established” Lines 123-132: some of the column headers are self-explanatory and do not need to be described in the text (e.g., subject IDs, ages). The only time you need to explain further is when there is information that is not explained in the table, such as the reason for two recording device types, or the 4 participants without GARS scores. Line 129: Does “GRAS grades” mean “GARS scores”? Line 149: “balancing on” should be “balancing of” Line 162: “there were asked” should be “they were asked” Line 174: Why were the uvular/guttural parts of the cries removed? Is there evidence that these parts are not informative? If so, you could cite that evidence. Otherwise, you could add “because we believed [whatever the reason was]” Line 177: what does it mean “to be comprehended by our audition”? Line 229: “till” should be “until” Line 230: “is appeared” should just be “appeared” Line 240: In my opinion, this whole “Feature extraction” section would have been helpful if it was presented before the section about classifying based on features. Line 255-256: I do not understand why you modified the spectral flatness features. If this is common practice, you could either just say that or provide a citation. If it was your own decision, you could say why you decided that it was necessary. Line 270: “omitted by” should be “omitted for” Line 271: I think this should have been said earlier, such as after line 249. While I was reading lines 253-255, I had been wondering how long a frame was. Line 279: “details)” has a parenthesis attached Lines 280-281 (and line 284, as well as lines 300-301): These are technical terms that I am not familiar with. Would it be useful to provide a citation? Line 336: what is meant by “the only method”? Line 353: “was used to train” should be “were used to train” Line 358: “subject” should be plural, “subjects” Line 368: I am not certain, but I think “claimed” should be “claims” Lines 372-373: I think this is good, if it is intended to screen for atypical development in general. But if this method is meant to be specific for ASD screening, then this would result in an overall decrease in specificity. Lines 480 (and elsewhere in the Appendix): you used some technical terms but not citations are provided. Reviewer #3: This manuscript addresses an important question: identifying children with ASD manifestations using biological information, specifically, auditory signals from crying. One compelling result is heightened variability in ASD cries relative to TD children. The interpretation (though now on P. 24 in the Appendix, and needs to be more fully elaborated and presented sooner in the ms) is that the findings are not due to cry per se, but due to the specific statistical features of the cry instances that capture heighted variability of cries. The authors propose that because the increased variability may be an enduring, fundamental feature of ASD, for this reason the pattern holds constant across age. I agree that heightened variability in biological signals is a clear and an emerging trend in ASD and I think that this manuscript does add important new information for our understanding of ASD in toddlers. There are several major technical/scientific and conceptual concerns that should be addressed though. One example (there are many, see below) of a technical concern is a lack of uniform processing strategy for cry instances, resulting in a huge range of cry durations (1/3 to 3 seconds). In addition, this manuscript requires substantial re-drafting to incorporate more recent references and rationale for certain points made, and other places require additional justification and elaboration. The text is somewhat sloppy and not well-prepared (however, this issue could be related to the language use concern noted below). Limitations should be articulated. Please have a colleague or a student who is a native British English or American English speaker carefully review and edit the entire manuscript, including captions and appendices. If this is not possible, please use an English-language editing fee-based service. The language use issues (e.g. awkward language and word use; grammar) are significant, numerous, and should be addressed adequately prior to re-submission. Despite these points, I think that the work has important strengths. I also think that given the importance of question on early ASD detection, the authors may be given an opportunity to revise the ms, provided that they carefully consider and address all points raised (for any points not addressed, please provide suitable rebuttals). abstract -there are currently 2 versions of the Abstract text – one provided along via the submission system and a second version included as part of the manuscript file. 1) Please review language usage in both and 2) please choose one version (not clear which is the final version?) “The approach has been tested on a dataset including 14 male and 7 female children with ASD and 14 male and 7 female TD children, between 18 to 53 months of age.” -but the algorithm was developed with male subjects data only? Why? What is the result when you test on male-only subset? Introduction “On the other hand, it is shown that fMRI [10] or EEG [11] can give discriminative features helping to diagnose ASD at earlier ages.” This statement is not accurate, as ref [10] predicted future ASD diagnoses made by traditional means (ADOS, ADI-R), by using rs-fMRI data from 6 mo. The diagnoses themselves are still performed using observational instruments. This approach is not the same as one to be used for “screening” for ASD symptoms in the absence of a diagnosis and also it is not the same as one to be used for using biobehavioral information in lieu of conventional diagnoses. Please be careful in your writing and distinguish (1) work reporting results that predict future ASD diagnoses/cases, or (2) work that is able to detect concurrent cases of ASD (i.e. diagnoses are made at the same time as classification), or (3) work that aims to establish biological features of ASD that can be themselves be used for ASD diagnoses in lieu of traditional instruments. Here it may be helpful to add ref Denisova & Zhao, 2017 who used movement data from rs-fMRI from 1 mo to predict future atypical developmental trajectories in general (this report provides the earliest age of detection using rs-fMRI data in an unconventional way). Please also add more recent references from various groups who report ASD classification/ screening using items from the ADOS that is more recent: Kuepper et al., 2020; Abbas et al., 2020. Also add more references for younger ages since that is your target population. “Furthermore, approaches which involves methods such as fMRI or EEG, are hard to be used on children, especially children with autism.” This statement requires support from the literature – what makes them “hard” or unsuitable? (excessive head movements, etc. Please provide specific reasons and supporting references for each point made) (e.g. Denisova 2019). Oller et al. 2010 paper should be discussed in the Introduction. P. 2 Introduction “To the best of our knowledge, there is only one work that proposed a method for identifying ASD children using only cry [24]. They used a dataset of 5 children with ASD and 4 TD children older than two years of age. They extracted 187 sound features from which 55 features were selected, using forward selection.” This reference #24 seems to be work from the same group as authors of the current manuscript. Please refer to this reference appropriately (“We” instead of “They”, or “Work from our group has shown that …”) Is there an overlap of children from study described in Reference #24 with the current work? Please clarify. P. 3 Introduction “It should be mentioned that the extracted features are age invariant and have been used to screen children with ASD between 18 to 53 months of age.” This point about the current method’s age-invariance is emphasized repeatedly as being a strength of the proposed approach, but as written, it is unclear why or even if this might be the case. It may be problematic in terms of specific behaviors not present at all ages. For an approach to be scientifically valid, it must have ecological validity with respect to natural human developmental behaviors. For instance, you may encounter fewer crying instances at older ages, even if the older child is undergoing therapy (these children may adapt a different response instead of crying, and throw temper tantrums and present loud yelling and vocalizations). The link to the fact that the statistical feature used (e.g. capturing heightened variability) is or could be feature of ASD that may be age-invariant is not really articulated, until the last page in the Appendix. Crying (and voice/speech) is a motor act. Research from various groups has now established that infants, children, adults with ASD have problems in the motor domain and in coordination of motor domain with other modalities. Please cite references to this highly relevant work and please discuss in the Discussion. P. 4 Subjects Please review ascertainment of and description of ASD diagnoses. Please re-write this section in a more succinct, formal prose. “The inclusion criteria of the ASD subjects were those who had been just diagnosed as autistic, based on DSM V . . .” This statement is not correct. -DSM-5 provides for diagnoses of Autism Spectrum Disorder only. “Autism” is not one of the diagnoses in DSM-5. -also, please note: the correct name for the fifth DSM version is DSM-5, not “DSMV”. -Please provide justification why ADOS (Toddler Module or Module 1), and/or ADI-R was not given. Is there a valid/official translation of the ADOS in Farsi that can be administered in Iran? Is there a research version available that you can request permission to translate from WPS, the publisher? If not, this information should be incorporated, as we clearly need version of ADOS (and/or ADI-R) for Islamic countries. However, as this is a limitation of the work, you should to provide a more thorough explanation of why the assessments are missing from your study and what are the plans to address this important issue in the future. -I just checked for available official/published translations on the ADOS website and I did not see a Farsi/Persian version. https://www.wpspublish.com/published-translations -Right now from the text as written it is not clear why these instruments were not given to children in this study. “For full descriptions of these procedures, see our previous work [27].” Please do not refer the readers to these key details elsewhere. You may write this section in a more succinct manner but crucial information and justification must be given here in this ms, especially as there are no strict word limitations in PLOS One. p. 6 Table 1. Typically Developed –> “Typically Developing”. The name of this normal control children’s cohort should be revised to state “Typically Developing”. “Subjects” – ‘participants’, ‘participating children’. P. 7, end of the Subject section This section is not well-prepared and some information (on the devices) is explained in the following section. Please re-draft and re-organize this material in these adjacent sections. “Furthermore, the number of subjects with high-quality recordings was 6 ASD and 4 TD.” This statement indicates that the main portion of the data was acquired with typical cell phones. Please re-run the main analysis, including only participants’ data obtained with cell phones. P. 7, 8 Data collection and preprocessing Need to provide more details about how the preprocessing step was implemented. Raw data handling: need to clearly state which stages were automatic and which were manual. Data samples: What are they being equated upon? Equal duration? “The reason for using various devices was to have a device independent model.” This statement holds if and only if you have already validated your algorithm on some standard data acquisition device (i.e. a high-quality recording device or a cell-phone with high quality audio recording). You need to be confident that the data are acquired in a robust and consistent way. After this step is demonstrated and findings are robust, then you may include a variety of recording devices. For this reason, it would be important to run an analysis using cell-phone data only, as cell-phone acquisitions represent the majority of your data. P. 8. “In this study, the final samples were between 320 milliseconds to 3 seconds.” It is not acceptable to have such a wide range of data length (1/3 s to a full 3 s). Have you run an analysis that rules out data length as a contributing factor? What about the number of instances per each subject/participant? You need to have a similar number of instances across subjects. Please provide clear justification for all analytic choices made. Please select one or more criteria, such as equal length of a cry instance, or the number of discrete crying instances, etc. P. 9 Caption of Fig1: “All instances of a subject have the same color.” – Do you mean to say: all instances *belonging* to a subject/participant? P. 9 “An example of case 1 is tip toe walking in children with ASD, which is common in about 25% of these children who do it most of the times.” -Please provide a reference from the literature for the tip toe walking example. -“most of the times” – “most of the time”. P. 13 Please review this technical description for mathematical accuracy: “Frame-wise temporal derivative means subtracting the value of a frame from the value of the next frame.” Subtracting value of one frame from the next frame produces a difference of values. “Finally, to compute the features, each instance was divided into 20 milliseconds frames.” What is the rationale for 20ms resampling / duration of each frame’s duration? Ideally you need to follow the Nyquist theorem (if you did, this needs be stated) for determining the proper sampling rate for a given phenomena of interest. However, you seem to have a fixed sampling rate of 44.1 kHz for all of the devices. All of this information needs to be reconciled in a technically and mathematically precise fashion and produce correct and concise descriptions for all methods. “Also, we added a few other promising features.” Please create a new Table summarizing fundamental descriptors and rationale for inclusion/exclusion of specific properties in this study. P. 20 (2nd page of Discussion and conclusion) “Finally, the proposed approach was also tested on 57 subjects, 56 TD children and one who diagnosed as Unspecified Neuro Developmental Disorder (UNDD), between10 to 18 months of age. The classifier distinguished the UNDD child from the rest. This suggests that a) the system can be used for children under 2 years of age, and b) it may be able to distinguish more general UNDD children from TD children.” Please see P. 51 of DSM-5 on assigning DSM-IV diagnoses using DSM-5 format. The Discussion section is the first time this 2nd dataset is mentioned (no mention in the methods, results, etc.). Please provide adequate details on this analysis in earlier sections as relevant. Appendix. This section contains key methods information necessary to understand analytical approach and is short enough to be in the main body of the ms. The ms is currently missing these details. Please move this information into the main body under the Methods section. P. 23, 24, 25 Please address the following queries related to preprocessing of the data: A1) Are these data continuous? i.e. have you retained original continuity of the data (i.e. as the data were acquired?). If not, how instances were strung together needs to be explained. A2) Pipeline/preprocessing: A 2a) How are technical issues of data preprocessing addressed here? (how are known problems addressed?) A 2b) illustration of raw data of interest (cry instances of ASD and TD participants) A 2c) equal length of samples/instances? All of this information is missing A3a) What software/scripts was used to handle the raw data (.wav files)? (What is the software used for input/output? Open source?) A3b) What software was used to program the classifiers (MATLAB, Python, etc.; which toolboxes or libraries were used if any?) A4) Appendix – P. 24 “This feature supports our expert’s report saying variations in the cry of ASD children is more than TD children.” A4a) Please present this report/ anecdotal variations in cry sooner in the ms. A4b) This information touching upon variability parameters in ASD cry data is highly significant and should be elaborated upon in the Discussion. There are now many papers in the ASD field reporting that autism biological signals (from children and adults with ASD and work with infants at high risk for autism) are characterized by increased/heightened levels of variability (e.g. Denisova & Zhao, 2017). Your result is in line with these other reports from many groups worldwide and uniquely adds to the growing research indicating that a marker of higher variability is an important feature in ASD. Perhaps that is why you find it age-invariant: it is not cry per se, but statistical features of the cry instances that hold constant across age. Please be sure to discuss this point in the Discussion. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step. 21 Jun 2020 The detailed response is given in the file prepared for it. We like to give our great appreciation to the reviewers for their constructive and important feedbacks. we are sure that the paper is in a much better shape now than before and the readers can benefit better from it. Best regards, Submitted filename: Response to Reviewers.docx Click here for additional data file. 12 Aug 2020 PONE-D-19-32813R1 Early screening of autism spectrum disorder using cry features PLOS ONE Dear Dr. Moradi, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Sep 26 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols We look forward to receiving your revised manuscript. Kind regards, Zhishun Wang, Ph.D. Academic Editor PLOS ONE [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #3: (No Response) ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #3: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #3: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #3: (No Response) ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: No Reviewer #3: No ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors have been very responsive to the last round of reviews and I'm happy with the manuscript as is; no suggested edits on my part Reviewer #3: I thank the authors for carefully addressing previous points made. I am marking this as a major revision because I would like to see the authors carefully and thoroughly address the following new and remaining points. Abstract in the current revised version “Due to the importance of automatic and early autism screening, in this paper, a cry- based screening approach for children with Autism Spectrum Disorder (ASD) is introduced. During the study, we realized that the ASD specific features are not necessarily observable among all children with ASD and among all instances of each child. Therefore, we proposed a new classification approach to be able to find such features and their corresponding instances. We tested the proposed approach and found two features that can be used to distinguish groups of children with ASD from Typically Developing (TD) children. In other words, these features are present in subsets of children with ASD not all of them. The approach has been tested on a dataset including 14 boys and 7 girls with ASD and 14 TD boys and 7 TD girls, between 18 to 53 months old. The sensitivity, specificity, and precision of the proposed approach for boys were 85.71%, 100%, and 92.85%, respectively. These measures were 71.42%, 100%, and 85.71% for girls, respectively. “ 1) The abstract, the final version of which was not available in the previous submission, requires substantial changes and must be fully re-written to include rationale for the current work. More details are needed about the analytic methods (including acquisition methods) and conclusion. First of all, the implications/conclusion of the work is completely missing and must be added. Overall, it should reflect all of the changes and edits in the current ms. 2) “Due to the importance of automatic and early autism screening” Please consider providing additional rationale for the importance of early ASD screening. It is mentioned in passing in the ms on P. 1 but I would like to see some more elaboration for why it is important to detect ASD (or ASD signs) in the main body of the ms /in the Introduction. 3) There remain many instances of awkward language use, which often leads to confusion about the intended meaning. I would like to see a version of this ms after appropriate language edits have been implemented, because I do not think that the Authors have adequately dealt with the language issues in this version. I would suggest the use of a professional editing service, or have a colleague who is a native British English or American English speaker help edit the ms. 4) Some examples of awkward language use, but there are others. Also please be very clear about the intended meaning. There is confusion about screening for ASD manifestations in children below 18 months *who have not yet been diagnosed with an ASD* and confirming ASD in children below 18 mo – please clarify what is meant by using clear language. “53 As mentioned above, there are studies tried to screen children with ASD under 18 months,” Should read: “As mentioned above, multiple studies attempted to screen children for ASD below 18 months of age” “widespread expertness for autism diagnosis“ Should read: “expertise for diagnosing autism” 5) Some references are missing in the ms – For example, it does not seem that this reference is in the main text. Paliwal KK, Lyons JG, Wójcicki KK, editors. Preference for 20-40 ms window duration in speech analysis. 2010 4th International Conference on Signal Processing and Communication Systems; 2010: IEEE. 6) The fact that ADOS is not available at all in Farsi is still not mentioned in the body of the ms. I think it is important to mention this fact, in case some readers wonder why you did not administer the ADOS to your participants. Please address in the main body of the ms why participants were not administered the ADOS. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #3: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 3 Oct 2020 The comments by the 3rd reviewer, especially the issue regarding using a professional editing service, has been addressed. The response to reviewers document includes all the corrections/comments regarding the reviewer's comments. Submitted filename: Response to reviewers.docx Click here for additional data file. 20 Oct 2020 Early screening of autism spectrum disorder using cry features PONE-D-19-32813R2 Dear Dr. Moradi, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Zhishun Wang, Ph.D. Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #3: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #3: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: I Don't Know Reviewer #3: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #3: (No Response) ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #3: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: (No Response) Reviewer #3: (No Response) ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #3: No 2 Dec 2020 PONE-D-19-32813R2 Early screening of autism spectrum disorder using cry features Dear Dr. Moradi: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Zhishun Wang Academic Editor PLOS ONE

38 in total

1. Age attenuates noise and increases symmetry of head movements during sleep resting-state fMRI in healthy neonates, infants, and toddlers.

Authors: Kristina Denisova
Journal: Infant Behav Dev Date: 2019-05-15

Review 2. Practice parameters for the assessment and treatment of children, adolescents, and adults with autism and other pervasive developmental disorders. American Academy of Child and Adolescent Psychiatry Working Group on Quality Issues.

Authors: F Volkmar; E H Cook; J Pomeroy; G Realmuto; P Tanguay
Journal: J Am Acad Child Adolesc Psychiatry Date: 1999-12 Impact factor: 8.829

3. Alexa, Siri, Cortana, and More: An Introduction to Voice Assistants.

Authors: Matthew B Hoy
Journal: Med Ref Serv Q Date: 2018 Jan-Mar

4. Vocational Rehabilitation Service Patterns and Outcomes for Individuals with Autism of Different Ages.

Authors: June L Chen; Connie Sung; Sukyeong Pi
Journal: J Autism Dev Disord Date: 2015-09

5. The relationship of motor skills and adaptive behavior skills in young children with autism spectrum disorders.

Authors: Megan MacDonald; Catherine Lord; Dale Ulrich
Journal: Res Autism Spectr Disord Date: 2013-11-01

6. Guidelines and best practices for electrophysiological data collection, analysis and reporting in autism.

Authors: Sara Jane Webb; Raphael Bernier; Heather A Henderson; Mark H Johnson; Emily J H Jones; Matthew D Lerner; James C McPartland; Charles A Nelson; Donald C Rojas; Jeanne Townsend; Marissa Westerfield
Journal: J Autism Dev Disord Date: 2015-02

7. Cry, baby, cry: Expression of Distress as a Biomarker and Modulator in Autism Spectrum Disorder.

Authors: Gianluca Esposito; Noboru Hiroi; Maria Luisa Scattoni
Journal: Int J Neuropsychopharmacol Date: 2017-02-15 Impact factor: 5.176

8. Identifying predictive features of autism spectrum disorders in a clinical sample of adolescents and adults using machine learning.

Authors: Charlotte Küpper; Sanna Stroth; Nicole Wolff; Florian Hauck; Natalia Kliewer; Tanja Schad-Hansjosten; Inge Kamp-Becker; Luise Poustka; Veit Roessner; Katharina Schultebraucks; Stefan Roepke
Journal: Sci Rep Date: 2020-03-18 Impact factor: 4.379

9. EEG Analytics for Early Detection of Autism Spectrum Disorder: A data-driven approach.

Authors: William J Bosl; Helen Tager-Flusberg; Charles A Nelson
Journal: Sci Rep Date: 2018-05-01 Impact factor: 4.379

10. Multi-modular AI Approach to Streamline Autism Diagnosis in Young Children.

Authors: Halim Abbas; Ford Garberson; Stuart Liu-Mayo; Eric Glover; Dennis P Wall
Journal: Sci Rep Date: 2020-03-19 Impact factor: 4.379