Literature DB >> 35486577

Underwater acoustic target recognition method based on a joint neural network.

Xing Cheng Han^1,2, Chenxi Ren^1,2, Liming Wang^1,2, Yunjiao Bai³.

Abstract

To improve the recognition accuracy of underwater acoustic targets by artificial neural network, this study presents a new recognition method that integrates a one-dimensional convolutional neural network and a long short-term memory network. This new network framework is constructed and applied to underwater acoustic target recognition for the first time. Ship acoustic data are used as input to evaluate the network performance. A visual analysis of the recognition results is performed. The results show that this method can realize the recognition and classification of underwater acoustic targets. Compared with a single neural network, the relevant indices, such as the recognition accuracy of the joint network are considerably higher. This provides a new direction for the application of deep learning in the field of underwater acoustic target recognition.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35486577 PMCID： PMC9053803 DOI： 10.1371/journal.pone.0266425

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

1. Introduction

In recent years, with the development of science and technology, underwater acoustic target recognition technology has attracted increasing attention from scientific and technical personnel because it is a vital issue in the field of underwater acoustic signal processing. The problem of underwater acoustic target recognition is extremely complex. The multifaceted underwater environment causes distortion in radiated noise [1], making underwater acoustic target recognition more difficult than conventional speech recognition. As technological development is slow, more accurate underwater acoustic target recognition methods must be investigated. The task of underwater acoustic target recognition is to analyze underwater acoustic signals received by a sonar system and extract the features of targets. At present, deep learning is a popular technology in various industries. Owing to its very strong feature extraction and optimization capabilities, deep learning has opened up a new development direction for underwater acoustic target recognition technology [2-9]. Many researchers apply convolutional neural network (CNNs) to underwater acoustic target recognition [10-13]. The long short-term memory (LSTM) architecture is suitable for processing and forecasting events with long intervals in time series. The analysis of ship-radiated noise depends largely on local time-frequency information and time-series related information; therefore, LSTM can be utilized for underwater acoustic target recognition [14-17]. Because the recognition framework of a single neural network makes the extraction of all features of underwater acoustic signals challenging [18-20], the research is usually focused on the development of deeper and more complex networks [21-29], which however are more difficult to train (in terms of training data size and labeling requirements). Therefore, building a new network model by combining various network structures may be a good solution. Studies on joint neural networks are mostly based on traditional two-dimensional (2D) CNN and LSTM. However, owing to the characteristics of the network model, 2D CNN seems to perform better in the field of image recognition. Conversely, 1D CNN are usually employed in speech-processing fields such as sequence modeling and natural language processing, where the use of 1D CNN reduces the amount of computation required. Therefore, in this study, a 1D CNN and LSTM network model are combined to identify underwater acoustic targets, with the aim to obtain a network with a higher training speed and recognition rate. Based on the related research on the acoustic target recognition technology in this study, a network recognition framework is built and a new type of neural network is established by combining the advantages of the 1D CNN and the LSTM. The network is trained using the extracted characteristics of ship acoustic signals as input.

2. Recognition principle

2.1 Convolutional neural network

CNNs usually include three types of network layers: convolutional, pooling, and fully connected. The pooling layer is also called the down-sampling layer. The convolution and pooling layers usually contain multiple feature matrices, which are generated by different convolution cores. Dimension reduction of data can be achieved through multiple convolution and pooling layers. Finally, the predicted category labels can be obtained through the fully connected output layer. As the task to be completed in this study is to recognize one-dimensional underwater acoustic signals, a one-dimensional convolutional neural network (1D CNN) is used [11, 12]. 1D CNN model is shown in Fig 1.

Fig 1

1D-Convolutional neural network model.

The difference between convolutional neural networks and other networks is convolution, and it is the most critical operation for convolutional neural networks. Through the convolution kernel, the convolution layer can extract important features from inputs and form feature vectors. Its operational expression is where X and X are the eigenvectors of the l and the l − 1 layers respectively, W is the convolution kernel, b is a biased vector, and * is the convolution operation. Discriminative features are extracted from input data through the linear transformation of the convolution operation, and then the characteristics more suitable for classification are obtained through the nonlinear transformation of the activation operation. The activation operation must be completed by setting the activation function. In this study, a common rectified linear unit (ReLU) activation function was used, and its expression is where x = input value. Owing to the extraction of characteristics of input data with high dimensions, it is easy to cause overfitting of the neural network to the training dataset. Therefore, usually pooling layers are joined to improve the operation speed, reduce the training time, and effectively prevent training data overfitting [7]. The pooling layers are calculated by sliding the kernel on the input matrix. However, the operation of the pooling kernel does not contain any parameters. Therefore, the pooling layers are usually divided into maximum pooling and average pooling. The maximum or average values of the matrix elements in the specified range of the previous layer are taken as the output of this layer. The output of the pooling layer is where S is the down-sampling rule, Maximum pooling is used in this study, and the maximum pooling expression is where represents the nth neuron in the eigenvector output by convolutional layer l − 1, and w is the pool size. After multiple convolution and pooling layers, the classification layer can be used to complete the classification and recognition tasks.

2.2 Long short-term memory

Because the LSTM network can analyze and extract data from each sequence, it is often widely used to process sequence data and model short-term or long-term dependencies between data [8]; therefore, the LSTM model is used in this study. It is a network structure with cyclic links connected to each other. As a whole, the LSTM network is still a recurrent neural network, but there are small loops of LSTM blocks in the network. The difference between the network and ordinary recurrent neural network is that neurons are replaced with LSTM blocks. Its biggest advantage is that it can link multiple nodes, connect the nodes of the same hidden layer in series, and realize parameter sharing among all nodes, making it completely different from other networks in technology [9]. The LSTM block is shown in Fig 2.

Fig 2

LSTM block model diagram.

The update calculation of forget gate implementation is as follows: where W is the weight, a( is the input of the previous cell, x( is the input of the current cell, σ is the sigmoid function, and b is the bias. The forget gate reads a( and x( and then outputs a value between 0 and 1 to the cell state c(, where 1 indicates that information is completely retained and 0 indicates that information is completely dropped. The in gate is used to decide how much new information is added to the current cell state, and the specific process of the in gate is expressed as where W and W are the weight, b and b are the bias, the update gate first uses the sigmoid function to calculate the information to be updated, and then uses the tanh function to extract the updated content. The output gate was used to determine the final output information of the cells. First, the sigmoid function calculates which information needs to be output, and then the tanh layer is used to output this information. The specific calculation process is as follows: where W is the weight, and b is the bias. The computation of the LSTM network is more complicated than that of an ordinary recurrent neural network, but its performance in learning long-term dependence is better than that of any known cyclic network, and it performs well in sequential processing tasks.

3. Experimental data

3.1 Dataset

The ship-radiated noise data used in this study came from the ShipsEar [30] dataset recorded in different areas of the Spanish coast between 2012 and 2013. This dataset consists of 90 acoustic records of 11 types of ships and environmental noise within 15 seconds– 10 minutes. According to the annotation in the original dataset, they can be grouped into four categories based on the types of ships, namely A, B, C, and D, and E for environmental noise. The types of ships included in each category are listed in Table 1.

Table 1

Dataset classification.

Category	Ship types
A	Fishing boats; Trawlers; Mussel boats; Tugboats; Dredgers
B	Motorboats; Pilot boats; Sailboats
C	Passenger ferries
D	Ocean liner; Ro-Ro vessels
E	Background noise recordings

3.2 Data processing

As the original data sets are all real data collected from the ocean, there are some problems such as excessive noise and blank segments in some datasets; therefore, the dataset of 90 acoustic signals needs to be preprocessed. First, a part of the acoustic signals with poor collection effect was removed. In the remaining acoustic signals, the blank segment left during the collection was removed manually, and the acoustic signals was de-noised. Some acoustic signals with low sounds were enhanced. To enlarge the dataset, we split the original 90 acoustic signals into 3seconds fragments. To characterize the features of the acoustic signals more comprehensively, we extracted as many features as possible for feature fusion as the network input, to achieve a better recognition effect. In addition to the traditional features like Mel-spectrogram and Mel-Frequency Cepstral Coefficients, we also used three features that are often used in music theory, namely chromatogram, spectral contrast and tonnetz, the following will be introduced separately: The first is to extract the Mel-spectrogram [31], obtain the Mel Bank Features based on Mel-scale, and the length of Mel spectrum is set as 128. Then the columns of the resulting matrix are compressed, the average value of each row is calculated, and an eigenvector of (128,1) is returned. The second is to extract the mel-frequency cepstral coefficients [32]. It is a kind of coefficient obtained by utilizing the human nonlinear auditory system, performing nonlinear conversion to the acoustic signal frequency spectrum corresponding to the Mel-spectrum, and then transforming to cepstrum. Here, the row dimension of its output is set to 40, and then column compression is performed on the obtained coefficient matrix to obtain the eigenvector with the final dimension of (40,1). The third is to calculate the chromatogram from the results of the short-time Fourier transform of the acoustic signal [33]. Because the feature it reflects is related to twelve different pitch levels, the resulting vector size is (12,1). Fourth, spectral contrast is extracted, and spectral contrast based on the octave scale can be used to extract the relative spectral characteristics of acoustic signal [34]. Through this step, the eigenvector size obtained is (6, 1). Fifth, tonnetz is extracted, which is mainly used to analyze the chord relationship of sound [35]. In this step, the pure fifth, third, and minor third are used as two-dimensional coordinates to obtain the feature vectors of (6,1). After the five features are extracted, the feature vectors obtained are fused, and for each acoustic signal, a feature vector with a dimension of (192,1) is provided as the input of the network. The processing flow chart is shown in Fig 3.

Fig 3

Data processing flow chart.

To make the five fusion extraction features of the input acoustic signals express more comprehensively, we conducted t-SNE visualization of single extracted Mel spectrum, MFCCs feature and fusion feature, and the results are shown in Fig 4.

Fig 4

t-SNE visualization result of the Mel spectrum, MFCCs feature and fusion feature.

(a) Mel spectrum; (b) MFCCs; (c) fusion feature.

t-SNE visualization result of the Mel spectrum, MFCCs feature and fusion feature.

(a) Mel spectrum; (b) MFCCs; (c) fusion feature. It can be seen from the above figures that the fusion feature is more separable than the single feature, mainly because the spectral contrast and other musical theory features used can capture the tonal features of the acoustic signals more sensitively. Therefore, the subsequent research in this paper will take fusion features as input. We manually screened the acoustic signals in accordance with the original annotation, removed some unprocessed acoustic signals with poor recording effect, and processed the remaining acoustic signals to obtain the actual data set. To make it easier for other researchers to use the ShipsEar dataset, each acoustic signal piece in the dataset is assigned a number, and the serial number used is indicated in Table 2.

Table 2

Actual size of the dataset used.

Category	Acoustic signal serial number	The number of data	Total
A	13,15,28, 46–49,66,73–76,80, 93–96	1040	4900
B	26,27,29,30,33,50–52,56,57,68, 70,72,77,79	790
C	6,10,40,42,43,52–54,59–65,67	1340
D	18–20,22,24,25,58,69,71,78	1135
E	81–92	595

To better verify the network, 4900 samples were randomly selected and divided into a training set and a test set in a ratio of four to one. The number of samples was 3920 for the training set and 980 for the test set.

3.3 Network construction

The1D-CNN network uses one-dimensional convolution to process a one-dimensional sequence model, which is widely used in acoustic signal recognition. Because the ship’s voyage is a continuous process, its acoustic signal characteristics must have continuity in time, so we can consider the method of processing time series signals to identify the ship target. The characteristics of the ship’s underwater acoustic signal is time-varying, and we can use the LSTM network to capture the characteristics of the current moment and the historical information of the previous moment. Combined with a one-dimensional CNN and LSTM network, the system can quickly adapt to signal changes and improve the recognition accuracy. Therefore, we build a joint model of the1D-CNN and LSTM network. The 1D CNN part of the network consists of two convolution layers and two pooling layers alternately. The pooling layer adopts maximum pooling, followed by a dropout layer, and the LSTM part consists of one LSTM layer and one dropout layer. Finally, it is sent into the dense layer for the classified output, and the network model is shown in Fig 5.

Fig 5

Joint network model.

Specific parameters of the network are shown in Table 3:

Table 3

Network parameter table.

Layer	Output Shape	Param
Conv_1D	191×64	256
Maxpooling1D	63×64	0
Conv_1D	62×128	24704
Maxpooling1D	20×128	0
Dropout	20×128	0
LSTM	32×1	20608
Dropout	32×1	0
Dense	5×1	165

4. Experimental results

4.1 Training network

Based on the aforementioned dataset, we randomly divided all 4900 acoustic signal clips into training and test sets, and the test set accounted for 20% of the total data. After setting up the joint network model, we set the network training parameters as shown in Table 4:

Table 4

Network training parameter.

Parameters	Parameter Settings
Loss	Categorical_crossentropy
Optimizer	Adam
Metrics	Accuracy
Batch_size	64
Epochs	100
Activation function(CNN)	ReLU
Activation function(LSTM)	Sigmoid

4.2 Training results

After 100 epochs, we obtained the loss and accuracy curves, as shown in Figs 6 and 7. The curve composed of blue points is the change curve of the training set, and the red curve is the test set.

Fig 6

Variation of accuracy.

Fig 7

Variation of loss.

The classification accuracy of the joint network for the data set reached 96.73% in the training set and 92.14% in the test set. In order to verify the performance of the joint network proposed in this study, we compared its recognition accuracy with 1D-CNN and LSTM networks, and the results are shown in Table 5.

Table 5

Comparison of three kinds of network recognition results.

Network	Accuracy of training set	Accuracy of test set
LSTM	82.27%	76.10%
1D-CNN	85.98%	84.18%
Joint Network	96.73%	92.14%

By comparison, we found that the recognition accuracy of the joint network was 14.46% higher than that of the LSTM network in the training set, 10.75% higher than that of the 1D-CNN, and 16.04% higher than that of the LSTM network in the test set, 7.96% higher than that of the 1D-CNN network. To intuitively see the recognition performance of the three networks on the ShipsEar dataset, we visualized the recognition results on the test set by drawing the confusion matrix, and the results are shown in Fig 8.

Fig 8

Confusion matrices for three networks.

(a) LTSM; (b) 1D-CNN; (c) Joint Network.

Confusion matrices for three networks.

(a) LTSM; (b) 1D-CNN; (c) Joint Network. In the figure, 0 to 4 of the horizontal and vertical coordinates represent labels A to E. By using the confusion matrix, we can calculate the recognition accuracy of the three networks for the five types of ship targets, as shown in Table 6.

Table 6

Various types of recognition.

Network	Accuracy of test set
Network	A	B	C	D	E
LSTM	79.00%	72.66%	69.34%	88.09%	71.43%
1D-CNN	77.00%	70.50%	91.63%	91.48%	98.32%
Joint Network	94.50%	76.26%	91.99%	96.60%	98.32%

The recognition accuracy of the joint network for the five types of targets is the highest among the three types of networks. Therefore, we can deduce that the joint network is of considerable help in improving the accuracy of underwater acoustic target recognition. Using the confusion matrix, we can obtain four commonly used indicators of evaluation models, TP, FN, FP, and TN, where TP means that positive class is predicted to be positive class, FN means that positive class is predicted to be negative class, FP means that negative class is predicted to be positive class, and TN means that negative class is predicted to be negative class. Therefore, we can calculate the precision and recall of the model. The calculation formula is as follows. For each category, the precision, recall and F1 score were calculated. The results are shown in Figs 9–11 respectively.

Fig 9

Precision about different categories.

Fig 11

F1 Score about different categories.

The figure reveals that the joint network proposed in this study performs better than the traditional single network in all aspects, especially in the F1 Score. As the F1 score considers the precision and recall rate, it is more comprehensive to evaluate the network with the F1 score. We can see that the target recognition score of the joint network is higher than that of the traditional single network. To ensure repeatability, we conducted 30 training sessions for the three network models, and compared the network recognition accuracy after 30 training sessions. The 30 training results for the three networks are shown in Fig 12.

Fig 12

Comparison results of the recognition accuracy of the three networks.

(a) Class A; (b) Class B; (c) Class C; (d) Class D; (e) Class E; (f) Overall recognition accuracy.

Comparison results of the recognition accuracy of the three networks.

(a) Class A; (b) Class B; (c) Class C; (d) Class D; (e) Class E; (f) Overall recognition accuracy. By conducting 30 experiments, we can see that the joint network proposed in this paper has absolute advantages in all types of recognition results, and the overall recognition effect of 30 times is better than that of a single LSTM network and 1D-CNN. However, we can also see that the network does not have a very good recognition effect for Type B, and this is presumed to be caused by the insufficient training of the network owing to the small number of type B samples. Class E, with the same small number of samples is environmental noise, which is clearly differentiated from other categories; thus, the recognition effect is good. By conducting 30 experiments, we found that the performance of the joint network was robust, which inspired us to use the joint network for underwater acoustic target recognition in the future.

5. Conclusion

In this study, a new network structure combining a 1D CNN and LSTM network is proposed and applied to underwater acoustic target recognition. The joint network can combine the advantages of the two neural networks to extract features from input data more comprehensively. The experimental results using the ShipsEar underwater vessel dataset show that the proposed joint network has a higher recognition rate than traditional neural networks. Compared with 1D CNN and LSTM networks, the joint neural network has higher accuracy, precision, recall and F1 score. The network also has a simple structure, fewer parameters and shorter training time. This provides a new development direction for underwater acoustic target recognition methods. The limitation of this study is that only one dataset is used in experimentation. Both the training and test sets originate from the ShipsEar dataset, thus the performance of this network has not been verified in an actual marine environment. Our research direction is to expand the data set, collect more measured ship noise acoustic signals, optimize network parameters by increasing the number of data sets and continuous training, so as to enhance the universality of the network. 27 Jan 2022

PONE-D-21-38631

Underwater Acoustic Target Recognition Method Based on Joint Neural Network

PLOS ONE Dear Dr. Han, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Mar 13 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Stavros Ntalampiras Academic Editor PLOS ONE Journal Requirements: 1. When submitting your revision, we need you to address these additional requirements. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: No ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: No Reviewer #2: No ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors present an underwater target classification approach combining CNN and LSTM. The authors have compared the results of the joint network with that of the individual networks and the proposed architecture yields good classification results on ShipsEar dataset. While the study appears to be sound, there are some significant issues that need to be addressed, some of which are listed below. 1) The language is unclear in many places, making it difficult to understand. Moreover, there are several grammatical mistakes. Many sentences are poorly constructed such as the first line of Abstract, a paragraph in Section 2.1 starting with “Due to extract the characteristics……...average pooling”, a paragraph in Section 3.2 starting with “The first is to extract……..eigenvector of (128,1)”, the first paragraph of Conclusion etc. Such mistakes can be noticed throughout the manuscript. So, it is highly recommended that the authors perform a thorough proofread before submitting the manuscript. 2) In various places, the authors claim that the joint network is proposed for the first time. There are several articles that utilize combined CNN and LSTM in underwater target recognition; the only difference here is that instead of 2D CNN, the authors make use of 1D CNN. So, it will be better to avoid such writings, or else, they should specifically emphasize the novelty of the current work by focusing on 1D CNN. 3) The authors need to significantly improve the technicality of the manuscript. The contents related to underwater target recognition and network architecture should be more technical. In many places, descriptions are provided in a casual manner. It is recommended to refer to good articles published in these domains to get a good technical background. 1. Introduction 1) The authors need to properly establish why they have come up with such an architecture. In this context, they must elaborate on the shortcomings/drawbacks of the existing models in underwater target recognition, such as joint architectures of 2D CNN and LSTM as well as 1D CNN and LSTM employing filterbank learning from raw acoustic inputs. The Introduction section need to be reframed accordingly to include a concise literature survey on these topics and a proper justification of the proposed architecture. 2) It seems a good practice to provide citations to works or use et al (with reference) rather than mentioning the full affiliations of the authors (as can be seen in the third and fourth paragraphs). 3) Correct the grammatical mistakes in various places. 2. Recognition principle 1) It will be better to reduce the theoretical part of CNN and LSTM; instead the authors may concentrate more on why CNN and LSTM have been chosen in the context of the current study. Also, it is not of much relevance to provide Fig. 1 & Fig. 2, as they are generic and well known. 2) The activation function in Fig. 1 is misleading. If the authors intended to show ReLU, draw it appropriately. 3) It is recommended to improve the drawing quality of Fig. 2, especially the arrangement of bottom arrows. 4) Many terms in the equation part of LSTM appear to be missing. 5) Correct the grammatical errors. 3. Experimental data 1) As seen in its original paper, the name of the dataset is ‘ShipsEar’. The authors have mentioned it as ‘SHIPSEAR’. Change it accordingly in all places. 2) Use capital letters for starting a title word such as ‘Category’ and ‘Ship types’ in Table 1. 3) In the Data processing section, the authors need to justify why they have used the mentioned features. Also, the formula for the extraction of these features must be provided or at least they must be cited. 4) Have the authors experimented with a single feature like Mel spectrogram alone or MFCC alone? If so, what is the trend of accuracy? How the authors have come up with 5 features? Is there any feature selection method employed? 5) It is seen that the authors have compressed the Mel spectrogram to obtain 1D data to feed to a 1D convolution layer. Have the authors experimented with 2D Mel spectrogram along with 2D CNN? If so, compare the results to justify why the proposed architecture is better. 6) What is the significance of the audio serial number in Table 2? Is there any specific reason for showing it in table? 7) In the second paragraph of Section 3.3, the authors have mentioned “The characteristics of ship…...current moment”. What is meant by that? Use proper technical words to construct such contexts. 8) The direction of writing in Fig. 4 is non-standard. It is preferable to reverse the direction for improved readability (refer to such diagrams in other articles). 9) It is recommended to provide 1 or 2 sentences describing the overall structure of the network, such as the number of convolution layers, max pooling, LSTM and dense layers. The details of individual layer can however be provided in a table, as in Table 3. 10) Nothing is mentioned about the activation function used. 11) Table 3 is an exact copy of the model summary obtained with that particular deep learning framework/library. It is recommended to modify it while including in a technical paper, such as replacing ‘None’ by the corresponding batch size, naming each layer appropriately etc. 12) Correct the grammatical mistakes here and there. 4. Experimental results 1) What is the validation data used in Fig. 5 and Fig. 6? Is it the test data itself or is it derived from the training set? It is always recommended to use separate training, validation and test sets. Provide good captions for the figures such as using ‘variation’ instead of ‘change’. 2) The authors should modify the caption for Fig. 7. 3) Are figures 8, 9 and 10 actually needed? Sufficient information is already available in Fig. 7. Hence, Fig. 8-10 appear to be redundant and space consuming. The authors may provide other relevant performance metrics instead of these. 4) It is better to use P-R curves instead of Table 7 in order to facilitate easy and appealing comparison of the three networks. 5) Use appropriate technical words in the paragraph starting with “In order to prevent the experimental...”. The authors may use “to ensure repeatability” like that. 6) After Fig. 11, the authors have mentioned that the joint network is performing well. They must also write an inference on why some classes have low recognition. 7) Table 8 is absolutely not needed. It the authors want to show the variations among different trials for the three networks, they may use a box plot. 8) Correct the sentence construction and grammatical errors. 5. Conclusion 1) The first sentence is poorly written. Reframe it. 2) Modify conclusion to include certain promising values of the experimental results. 3) It is mentioned that the dataset used is single. What is meant by that? Reframe it for better understanding. Reviewer #2: 1. The authors are suggested to carefully explain the values of parameters related with neural networks used in simulation; 2. The authors discuss the main structure of neural networks used in this paper, such as CNN and LSTM. However, these two structures are the main popular neural networks widely used in different applications. Therefore, the authors are suggested to highlight the main contributions of this paper and differences between the current work and some published works. 3. The English wording and grammar should be carefully polished to mitigate misunderstanding. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 10 Feb 2022 The authors are very grateful to the reviewers and editor for their valuable comments. We have responsed the comments according to the suggestions.However due to some formulas, pictures and tables in the responses cannot be reflected here, so we wrote a document about the responses to the comments and uploaded it to the system. The document name is :“Response to Reviewers” .So reviewers and editor can view this document,thank you! Submitted filename: Response to Reviewers.docx Click here for additional data file. 28 Feb 2022

PONE-D-21-38631R1

Underwater acoustic target recognition method based on a joint neural Network

PLOS ONE Dear Dr. Han, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Apr 14 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Stavros Ntalampiras Academic Editor PLOS ONE Journal Requirements: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: (No Response) Reviewer #2: (No Response) ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: (No Response) ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: (No Response) ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: (No Response) ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: No Reviewer #2: (No Response) ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors have attempted to incorporate the suggestions provided by the reviewers and have tried to answer various queries raised by the reviewers, but thorough proofreading has not been performed yet. Grammatical and spelling mistakes can still be noticed in various places, and in many places where modifications have been made, errors are prominent. This will significantly affect the quality of the manuscript. Some of the suggestions are as follows. 1) For tables 1 to 5, it will be visually more appealing to use the same format used in the originally submitted manuscript, i.e. use borders only for title and at the bottom than using borders everywhere. 2) Some paragraphs have only one or two lines. Merge paragraphs that are too short. Abstract: 1) It is more common to use ‘indices’ instead of ‘indexes’. Introduction: 1) Paragraph 4: In the second line, the continuity for LSTM has been lost; Reframe it. 2) Paragraph 5 & 6: Grammatical errors- ‘seem’, ‘reduce’ and ‘build’ are used instead of ‘seems’, ‘reduces’ and ‘built’ respectively. Recognition Principle: Convolutional neural network: 1) Technical mistake in the line: “Finally, the predicted category labels can be obtained through the convolution layers” 2) Paragraph 3: Comma should be provided after respectively; instead it is provided before. Long short-term memory: 1) Paragraph 3: ‘Forgetting gate’ is provided instead of ‘Forget gate’ 2) Incorrect commas between Wi & Wc and between bi & bc, after equation 8. Data processing: 1) Paragraph 4: Spelling mistake - ‘calcultated’ 2) The authors need not elaborate much on audio related theories behind spectral contrast and tonnetz. There is no need to mention the librosa function used to calculate tonnetz; instead provide equations to extract these features, if available. The use of audio everywhere does not seem a good practice in an underwater target recognition paper, since an audio signal is mostly associated with human hearing and spans frequencies from 20 Hz - 20 kHz, which is not the scenario of a ship signal. The authors may instead mention that they have used features that are mostly auditory-inspired and in other places where they have mentioned audio signals, they may either use acoustic signals or target signals (even in Fig 3.) 3) Paragraph 9 : Grammatical mistake in the line “each audio data will obtain a (192,1) feature vector” 4) It is suggested to change the spelling of T-SNE to t-SNE in multiple places. 5) Wrong title for Figure 4 wrong - t-SNE embedding of digits? 6) Provide legends in Figure 4 to understand the various categories. Network Construction: 1) Paragraph 3: Why 'DENSE' is provided in capital letters? Training results: 1) Paragraph 3: Incorrect sentence construction - “To verify whether the performance of the combined network proposed in this study…….” 2) Redundancy in the word ‘category’ just before Fig 9.: “For each category, the precision, recall and F1 score for this category…” 3) Why it is mentioned “The result of the precision…” in Fig 9, “The result of the recall…” in Fig. 10 and “The result of F1 score…” in Fig. 11? Precision, recall and F1-score themselves are results. Simply write Precision, Recall and F1-score for the figure titles & captions. Provide a label such as ‘category’ for x-axis. 4) The caption for Fig. 11 is wrongly provided. 5) Grammatical mistake in the paragraph after Fig. 11: “evaluating the network in terms of the F1 score it is more comprehensive” Conclusion: 1) It was suggested to include certain promising values of experimental results, but it is still not incorporated. For this, the authors may write a sentence with the values of accuracy, P, R and F1 of the proposed joint network, compared against LSTM and 1D CNN. 2) Paragraph 3: Grammatical mistake - “performance of this network was not be verified in an actual marine environment…” 3) Last sentence is too long and incorrectly constructed. Split the sentence instead of using so much ‘and’. Reference 1) Reference [30] still shows “SHIPSEAR” instead of “ShipsEar”. Reviewer #2: The authors have already revised the draft based on the reviewers' comments. I suggest to accept this paper after checking wording and format. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

3 Mar 2022 The authors are very grateful to the reviewers and editor for their valuable comments. We have responsed the comments and upload the responses as a document with the name:“Response to Reviewers”.Please see this document "Response to Reviewers.docx", thank you! Submitted filename: Response to Reviewers.docx Click here for additional data file. 21 Mar 2022 Underwater acoustic target recognition method based on a joint neural Network PONE-D-21-38631R2 Dear Dr. Han, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Please make sure to implement the final minor comments made by the Reviewers. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Stavros Ntalampiras Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: (No Response) Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: No Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors have tried to incorporate the comments/revisions suggested by the reviewer. However, in some places, the authors have misunderstood what the reviewer really meant. Also, some minor grammatical mistakes still exist in corrected sections. The authors are suggested to correct those in the final manuscript. 1) Correct the 4th paragraph of Introduction section as follows. "The long short-term memory (LSTM) architecture is suitable for processing and forecasting events with long intervals in time series. The analysis of ship-radiated noise depends largely on local time-frequency information and time-series related information; therefore, LSTM can be utilized for underwaster acoustic target recognition [14-17]." 2) Change the last line of 1st paragraph of section 2.1 as "Finally, the predicted category labels can be obtained through the fully connected output layer." 3) Reframe the two lines in the 3rd paragraph of section 2.2 as, i) "The update calculation of forget gate implementation is as follows:" ii) "The forget gate reads...............................completely dropped." 4) Combine the 2nd and 3rd lines of 1st paragraph of section 3.1 as follows. "This dataset consists of 90 acoustic records of 11 types of ships and environmental noise within 15 seconds – 10 minutes." 5) Change the last line of 1st paragraph of section 3.2 as follows. "therefore, the dataset of 90 acoustic signals needs to be preprocessed." 6) Change the 9th paragraph of section 3.2 as, "After the five features are extracted, the feature vectors obtained are fused, and for each acoustic signal, a feature vector with a dimension of (192,1) is provided as the input of the network." 7) Change the paragraph before Table 2 as, "To make it easier for other researchers to use the ShipsEar dataset, each acoustic signal piece in the dataset is assigned a number, and the serial number used is indicated in Table 2." 8) Change the title in Table 2 from "Acoustic signals serial number" to "Acoustic signal serial number" 9) Change the first line of 2nd paragraph of Conclusion as follows. "The experimental results using the ShipsEar underwater vessel dataset show that the proposed joint network has a higher recognition rate than traditional neural networks. Compared with 1D CNN and LSTM networks, the joint neural network has higher accuracy, precision, recall and F1 score.The network also has a simple structure, fewer parameters and shorter training time." Reviewer #2: The authors have carefully addressed all comments. I suggest to accept this paper after checking format and wording. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No 8 Apr 2022 PONE-D-21-38631R2 Underwater acoustic target recognition method based on a joint neural Network Dear Dr. Han: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Prof. Stavros Ntalampiras Academic Editor PLOS ONE

6 in total

1. Effect of Spectral Contrast Enhancement on Speech-on-Speech Intelligibility and Voice Cue Sensitivity in Cochlear Implant Users.

Authors: Nawal El Boghdady; Florian Langner; Etienne Gaudrain; Deniz Başkent; Waldo Nogueira
Journal: Ear Hear Date: 2021 Mar/Apr Impact factor: 3.570