Literature DB >> 35077521

A hybrid neural network for driving behavior risk prediction based on distracted driving behavior data.

Xin Fu¹, Hongwei Meng¹, Xue Wang¹, Hao Yang², Jianwei Wang³.

Abstract

Distracted driving behavior is one of the main factors of road accidents. Accurately predicting the risk of driving behavior is of great significance to the active safety of road transportation. The large amount of information collected by the sensors installed on the vehicle can be identified by the algorithm to obtain the distracted driving behavior data, which can be used to predict the driving behavior risk of the vehicle and the area. In this paper, a new neural network named Driving Behavior Risk Prediction Neural Network (DBRPNN) is developed for prediction based on the distracted driving behavior data. The network consists of three modules: the Feature Processing Module, the Memory Module, and the Prediction Module. In this process, attribute data (time in a day, daily driving time, and daily driving mileage) that can reflect external factors and driver statuses, are added to the network to increase the accuracy of the model. We predicted the driving behavior risk of different objects (Vehicle and Area). For the applicability improvement of the model, we further classify the distracted driving behavior categories, and DBRPNN can provide more accurate risk prediction. The results show that compared with traditional models (Classification and Regression Tree, Support Vector Machines, Recurrent Neural Network, and Long Short-Term Memory), DBRPNN has better prediction performance. The method proposed in this paper has been fully verified and may be transplanted into active safety early warning system for more accurate and flexible application.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35077521 PMCID： PMC8789134 DOI： 10.1371/journal.pone.0263030

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Driving behavior analysis is an important part of traffic safety research. It reflects the status of drivers and vehicles in the process of vehicle operation. Distracted driving behavior refers to a series of operations conducted by drivers on public roads that may lead to abnormal traffic conditions and thus road accidents [1]. The analysis of driving behavior is helpful to measure drivers’ driving safety and prevent traffic accidents. As we all know, there is a close connection between distracted driving behavior and traffic accidents. With the advancement of the Internet of Things technology, the data collected by large-scale drivers’ driving behaviors gradually become more available. The development trend of the collection of driving behavior data has great significance and influence on the prevention of traffic accidents. Distracted driving is one of the most important factors leading to traffic accidents [2]. Effective prediction of distracted driving behavior of vehicles can timely remind drivers or forcibly take over the vehicle with safety control devices at critical moments, to effectively prevent traffic accidents. Simulation of the driver’s driving behavior is the most direct way to forecast for distracted driving behavior, however, a different driver’s driving skills, driving style, emergency ability, mood swings, mental status, education background, life experience, each is not identical [3-7], such as environment such as road conditions, weather, illumination, time of day also make a big difference [8], These uncertain factors make it difficult to simulate individual driving behavior objectively. However, in the process of driving, no matter what factors the vehicle and the driver is affected by, the distracted driving behavior will eventually be reflected by the vehicle and the driver’s behavior. Based on this fact, this paper carries on the risk prediction research through the distracted driving behavior data. Our contribution is mainly located in four aspects: The Driving Behavior Risk Prediction Neural Network (DBRPNN) is proposed, which consists of three parts: the Feature Processing Module, the Memory Module, and the Prediction Module. The model can predict the driving behavior risk with high precision according to the distracted driving behavior data. The performance of large, real provincial datasets tested in this neural network is encouraging. For every 30 minutes, the Accuracy is 0.9146 and the Weighted-Precision is 0.9156. We observed the impact of different time intervals on the prediction results. When the time interval is 30 minutes, the risk prediction Accuracy is the highest, and the Accuracy is 0.9146. It not only predicts the risk level of the vehicle but also predicts the risk level of the area. It is suggested that the prediction results have a clear supporting role for both drivers and road management. Test different distracted driving behavior categories: the distracted driving behavior shown by the vehicle and the distracted driving behavior shown by the driver. The results show that DBRPNN is capable of handling the risk prediction tasks of different categories.

Related work

At present, studies on distracted driving behavior at home and abroad are mainly divided into studies on drivers and cars. N. Kuge collected steering wheel Angle data on the simulator and established the lane change intention model through hidden Markov theory [9]. Andrew Liu proposed that driver control behavior can be predicted through vehicle movement behavior [10]. Omerustaoglu et al. combined in-car data and image data to study distracted driving behaviors through deep learning [11]. Jeong et al. used the data collected by the built-in 3-axis gyroscope of the vehicle to identify two driving behaviors by support vector machine [12]. Other studies described various aggressive driving behaviors and formulated their standards (Tasca [13], Abou-Zeid [14], Li [15], Yang [16]). Chen et al. proposed a graphical modeling method based on onboard GPS and OBD data and modeled individual driving behaviors through statistical methods [17]. Han and Yang collected vehicle speed, acceleration, and deflection speed through vehicle equipment to identify four dangerous driving states [18]. In terms of risk prediction, behavioral prediction and flow prediction have made abundant research progress. Tang proposed a forecasting framework named the spatiotemporal gated graph attention network to predict the urban traffic flow based on license plate recognition data [19]. In addition, Pu uses historical data to predict road surface friction [20, 21]. Tang used a geographically weighted Poisson quantile regression model to study the spatial heterogeneity and estimated the spatial impact on crash frequency [22]. There are a large number of predictions of distracted driving behaviors and accidents in existing studies, which can be divided into three categories: Linear theoretical model based on time series model and Kalman filtering model [23, 24]. Nonlinear statistical model based on a nonparametric regression model and chaos theory model [25-28]. Machine learning prediction model based on neural network and support vector machine [26, 29–32]. To sum up, existing studies, especially neural network modeling methods, have made great progress in the prediction of distracted driving behaviors and accidents, but further exploration is still needed in the following aspects: Most studies only consider vehicles or drivers, and few studies consider both perspectives [30]. Some studies only consider the perspective of time, without combining the perspective of space [23]. Data used in some studies are obtained through simulation experiments [32], but real data are necessary to understand the actual situation of distracted driving behavior. With the continuous development of neural network technology, neural network models can dig out deeper rules of data. Neural Network has great advantages in dealing with traffic flow prediction [33] and traffic accident prediction [34]. Among many Neural Network models, Recurrent Neural Network (RNN) can simulate continuous information by maintaining chain structure and internal memory and circulation [35]. It is widely used in traffic information prediction [36]. However, when the input sequence is long, RNN will have the problem of long-range dependence. As a variant of RNN, Long Short-Term Memory (LSTM) effectively solves problems such as gradient dispersion of RNN and can make better use in long-distance time-series data [37]. LSTM neural network was first proposed in 1997 and is a special form of RNN. Compared with other neural networks, LSTM has better applicability in processing sequence data and identifying trends [38]. LSTM model has been successfully applied in time series data research in various fields, including traffic flow prediction in the field of road transportation [38], text speech recognition and machine translation in the field of text language [39], and protein structural sequence prediction in the field of medicine [40]. In this paper, we put up with a hybrid network named Driving Behavior Risk Prediction Neural Network (DBRPNN), Based on the LSTM model, and the rest parts are organized as follows: The third part is the description of the distracted driving behavior data, the fourth part is the DBRPNN structure, the fifth part is the results and discussion, the sixth part is the application and future implementation, and the last part is the conclusion.

Data description

Distracted driving behavior data

The information collected by multiple camera sensors and radar sensors of the vehicle can be used to obtain the distracted driving behavior data after image recognition and distance recognition. The main purpose of the collection of the distracted driving behavior data is to comprehensively record the information of the distracted conditions of the driver and the vehicle and to remind timely. The complete distracted driving behavior data usually needs to be determined by the time of the behavior, the latitude and longitude of the behavior, and the behavior code. The typical structure of the distracted driving behavior data (taking Shaanxi Province of China as an example) is shown in Table 1. This article does not list some irrelevant fields, such as vehicle registration, owner, speed, etc.

Table 1

Distracted driving behavior data structure.

Field Name	Field Type	Data Example	Remarks Example
Vehicle Id	Int	16254
Vehicle Trans Type	Int	10	‘10’- Passenger vehicles, ‘30’- Dangerous goods transport vehicle
Time	Data	2021-03-01 05:41:06
Code	Int	10404
Longitude	Float	107.203014
Latitude	Float	34.369448

Due to a variety of possible situations, the distracted driving behavior data generates abnormal data that can affect the results. The following cases of data will be deleted in this paper to reduce interference. Data loss: it cannot reflect the specific situation of the distracted driving behavior. Data redundancy: multiple data records reflect the same distracted driving behavior. Data anomaly: data records violate normal travel rules. Including record data when the vehicle is not driving, the latitude and longitude are not within the normal range, etc. The time record in the distracted driving behavior data is intermittent. To observe the prediction effects under different periods, this paper refers to some similar research practices [41, 42], combines the distracted driving behavior data according to vehicle ID or area ID, and summarizes it into four different time intervals: 30 minutes interval, 60 minutes interval, 90 minutes interval, and 120 minutes interval. When the time interval is less than 30 minutes, the scale of time units containing distracted driving behavior will be small. In the model validation section, we will also study the model performance differences at different time intervals.

Attributes data

Right attributes are significant for describing factors related to distracted driving behaviors, which is conducive to the prediction of distracted driving behaviors [30]. Many studies have proved that the occurrence of distracted driving behavior is related to the external environment and the driver’s mental state [6, 8]. In this paper, the time of a day is used to describe the factor of the external environment. Due to the difference in visibility between day and night, and the number of distracted driving behaviors in the morning is different from that in the afternoon, the number of distracted driving behaviors in the first half of the night is different from that in the second half of the night. Therefore, a day is divided into four time periods at an interval of 6 hours. The quantification of specific variables is shown in Table 2. Since it is difficult to obtain the mental state of drivers, the fatigue state of individuals is deeply affected by driving intensity [43], so this paper uses the daily driving time and daily mileage to measure this attribute. As the input of the neural network, the attribute of the time of a day is called DayID, and the other two attributes are summarized as a whole set, called DriveID. The sample data is shown in Table 3.

Table 2

Quantization of time.

The variable name	Quantitative range	Quantitative coding
The time of a day	[0:00 ~ 6:00)	0
	[6:00 ~ 12:00)	1
	[12:00 ~ 18:00)	2
	[18:00 ~ 00:00)	3

Table 3

The input data structure of DBRPNN.

Field Name	Field Type	Data Example
Risk Level	Int	0
The Time of a Day	Int	3
Driving Time	Float	8.77
Driving Mileage	Float	318.2

Network structure of DBRPNN

Definition, distracted driving behavior categories

The distracted driving behavior categories refer to the factors that threaten driving safety detected in the process of road transportation. There are eight distracted driving behavior codes used in this paper, as shown in Table 4. According to different behavior objects, distracted driving behaviors can be divided into two categories. Three codes of behaviors in category 103 are the distracted driving behavior shown by the vehicle, and five codes of behaviors in category 104 are the distracted driving behavior shown by the driver.

Table 4

Categories of distracted driving behavior.

Category	Code	Paraphrase
103	10300	About to hit the vehicle ahead while driving.
	10301	The vehicle deviates from the lane while driving.
	10302	Driving too close to the vehicle ahead.
104	10400	The driver is driving with physical fatigue.
	10401	The driver makes calls while driving.
	10402	The driver smokes while driving.
	10403	The driver closes his eyes while driving.
	10404	The driver yawns while driving.

Network establishment

Due to the different driving habits of drivers and types of vehicles, the number of distracted driving behaviors cannot be a good measure of the risk status of distracted driving behaviors. Therefore, this paper will predict the risk level. The neural network used to predict the risk level is a modular plug-in neural network. The architecture is shown in Fig 1. It consists of three modules, the Feature Processing Module (FPM), the Memory Module (MM), and the Prediction Module (PM). The FPM is a module responsible for classifying and standardizing features, and connecting them in series. The MM is based on LSTM to capture the time dependence of risk level changes. The PM is responsible for converting the output of the neural network into a risk level.

Fig 1

Driving behavior risk prediction neural-network (DBRPNN) architecture.

Feature Processing Module (FPM)

Hierarchical & normalization & concatenate

The risk level is used to describe the degree of danger during driving. This paper uses the K-means algorithm to classify the risk levels. K-means is a clustering algorithm that determines the category of feature parameters based on the distance between each point in the data feature parameter set and the cluster center. This paper summarizes the historical distracted driving behavior data according to different time intervals and obtains the number of distracted driving behaviors for each vehicle in the period i, i represents the t-th period divided by the i time interval. Use K-means to classify the number of distracted driving behaviors for each vehicle, set the k value to 3, and get the risk level of each vehicle in the period i, which is 0, 1, and 2 respectively. In addition, this article also divides the 100*100 grid according to the latitude and longitude range of the distracted driving behavior point, and obtains the average number of vehicle distracted driving behaviors in each grid during the period. Use K-means to classify the average number of vehicle distracted driving behaviors in each area, set the k value to 3, and obtain the risk level of each area, which is 0, 1, and 2 respectively. Level 0 indicates that the vehicle or area is in a low-risk state at this time and no action is required. Level 1 indicates that the vehicle or area is in a medium-risk state and measures should be taken according to the situation. Level 2 indicates that the vehicle or area is in a high-risk state and immediate measures are required. This paper uses Silhouette Coefficient to evaluate the clustering effect of K-means, and the results of different values of k are shown in Table 5.

Table 5

The clustering effect of K-means.

Number of Clusters	Silhouette Score
2	0.8193
3	0.8200
4	0.8033
5	0.8085

The driving time of the day and the driving distance of the day in DriveID are the required factors, but their format is not suitable for direct input to the neural network, and must be normalized. This paper uses Z-score normalization to avoid extreme value changes in the network weight. Speed up the training process. DriveID and Day ID together constitute the attribute set . The formula of Z-score: Concatenate here is used to merge the risk level with DayID and Drive ID into the entire carrier before capturing time dependence.

Memory Module (MM)

In the prediction of risk levels, the output of the network is not only related to the input at the current moment but also related to the output in the past period. RNN is a neural network with short-term memory capabilities. The neurons in RNN can not only receive information from other neurons but also receive their information, forming a cyclic network structure. LSTM is a variant of RNN, which can effectively solve the gradient dispersion problem of the simple recurrent neural network, and can better characterize time series data. Based on RNN, the improvement of LSTM mainly lies in two aspects, the introduction of a new internal state and the introduction of a gating mechanism. LSTM introduces a new internal state c ∈ R (n-dimensional column vector) specifically for linear cyclic information transfer, and at the same time outputs information nonlinearly to the external state h ∈ R of the hidden layer. The internal state c records the historical information up to the current moment, the calculation formula: Where: f ∈ [0,1] is the forgetting gate, which controls how much information should be forgotten in the internal state c at the last moment; e ∈ [0,1] is the input gate, which controls how much information the candidate state should keep at the current moment; o ∈ [0,1] is the output gate, which controls how much information the internal state c should output to the external state h at the current moment, and the three gates control information transmission Path; ⊙ is the product of vector elements; c is the memory unit at the previous moment; is the candidate state obtained by the nonlinear function, the calculation formula: Where: W, U, b are the learnable network parameters. LSTM introduces a gating mechanism to control the path of information transmission. The three gates are soft gates with values between (0,1), allowing information to pass through in a certain proportion. The calculation formula of the three gates: Where: σ is the Logistic function, and the output interval of the function is (0,1); x is the input at the current moment, and h is the external state at the last moment. The recurrent unit structure of LSTM is shown in Fig 2. The calculation process is: firstly calculate the three gates and through x and h, namely formula (4) to formula (7); then combine f and e to update c, namely formula (2); finally, combine o passes information to h, which is formula (3).

Fig 2

LSTM network architecture.

Prediction module

This module is responsible for converting the output of LSTM into a risk level (or ). Here a linear layer is used to convert the neuron output into three risk levels through Linear Mapping.

Model training

This paper evaluates the performance of the network through Accuracy and Weighted-Precision. Accuracy is the percentage of samples with correct predictions to the total samples. Precision in the second classification is the percentage of the actual positive samples predicted to be positive. This paper is the three classification situation, so the Precision of each level needs to be calculated and weighted. The calculation formula of the two: Where: T is the number of samples with correct grade prediction; F is the number of samples with incorrect grade prediction; TP (x is 0, 1, 2) is the number of positive samples predicted to be positive samples for various types, FP (x is 0, 1, 2) is the number of various types of actual positive samples predicted as negative samples, and W (x is 0, 1, 2) is the proportion of each type to the total number of vehicles.

Experiment establishment

Experimental data description

This paper uses the distracted driving behavior data of 74 passenger transport vehicles and 26 dangerous goods transport vehicles in Shaanxi Province to conduct an empirical study. This data set is collected and managed by the Department of Transport Shaanxi Province and provided by Shaanxi Provincial Road Transport Development Center. Shaanxi Province is an inland province in northwestern China. There are 10 prefecture-level cities and 107 county-level administrative regions. Shaanxi Province is an important province connecting the northwestern region to other regions. Since most of these 100 vehicles are transported in medium and long distances, the location where the distracted driving behavior occurred is mainly in Shaanxi Province and spread across many provinces around Shaanxi. The distracted driving behavior coordinate points are shown in Fig 3. There are eight codes of distracted driving behavior in this paper, and the frequency diagram of each code is shown in Fig 4. The data used in this paper covers 92 days (March, April, and May 2020), with an average of 5,705 pieces of data per day. We use the data of the first 72 days for training, the data of the next 10 days as the validation set, and the data of the last 10 days for testing. When we perform neural network training and model verification, we will delete abnormal data according to the situation mentioned in the description section. Fig 5 is a graph of the daily average number of distracted driving behaviors throughout the network.

Fig 3

Distracted driving behavior coordinate point.

The coordinates of vehicles at the time of distracted driving behavior can be seen in the figure, which is mainly distributed in Shaanxi Province and some surrounding provinces.

Fig 4

Scale of each distracted driving behavior code.

Fig 5

Daily average number of distracted driving behaviors.

Distracted driving behavior coordinate point.

The coordinates of vehicles at the time of distracted driving behavior can be seen in the figure, which is mainly distributed in Shaanxi Province and some surrounding provinces.

Experimental settings

DBRPNN is built with PyTorch 3.9, a well-known AI platform. In the training process, the loss function is Cross-Entropy, the optimizer is Adam whose learning rate is set as 0.01, and the batch size is 64. EPOCH is set to 100, and Early Stopping is used to control the number of iterations to prevent overfitting. Training ends when the loss function no longer improves or EPOCH reaches 100. The training computer equipped with a Graphics 630 and one Intel Core i5 CPU. The operation system is Windows 10.

Performance evaluation

Using DBRPNN to predict the risk level of vehicles and areas, some of the results are shown in Fig 6. It can be seen from the figure that DBRPNN’s prediction results for different objects have the same trend as the real data.

Fig 6

The prediction result of the risk level.

(a) Vehicle ID 6096. (b) Vehicle ID 21973. (c) Area ID (53,24). (d) Area ID (53,23). (e) Category 104 of Vehicle ID 20635.

The prediction result of the risk level.

(a) Vehicle ID 6096. (b) Vehicle ID 21973. (c) Area ID (53,24). (d) Area ID (53,23). (e) Category 104 of Vehicle ID 20635. To make the results more convincing, the average of the predicted results of 10 vehicles is selected to evaluate the network performance, as shown in Table 6.

Table 6

Network performance comparison.

	Accuracy	Weighted-Recall
CART	0.7712	0.7724
SVM	0.7866	0.8135
RNN	0.7979	0.8137
1 Layer-LSTM	0.8214	0.8116
2 Layer-LSTM	0.8583	0.8479
Bi-LSTM	0.8604	0.8665
DBRPNN	0.9146	0.9156

CART: Employ CART model to predict the level of risk within a specific time interval. SVM: Employ SVM model to predict the level of risk within a specific time interval. RNN: Take advantage of RNN to predict the level of risk within a specific time interval. 1 Layer LSTM and 2 Layer-LSTM: Use one and two layers LSTM networks to predict the level of risk. The result shows that a 2-layers LSTM is better than one-layer. Bi-LSTM: Use bidirectional LSTM (Bi-LSTM) network without the FPM. The prediction result of Bi-LSTM is better than that of ordinary LSTM. From the results, it can be found that the DBRPNN network can effectively predict the risk level. And with the help of the FPM, Accuracy is increased by 5.42% compared with only use the Bi-LSTM. This paper also trains DBRPNN through four different time interval data, and tests each trained DBRPNN. The results are shown in Table 7. The 30 minutes prediction has the highest Accuracy of 0.9146, which is encouraging. In addition, DBRPNN also has stable and good performance in different time intervals.

Table 7

Prediction performance for different time intervals.

Time interval	Accuracy	Weighted-Recall
30 min	0.9146	0.9156
60 min	0.8854	0.8843
90 min	0.8833	0.8756
120 min	0.8542	0.8810

Under normal circumstances, different distracted driving behaviors will lead to large deviations in the risk level prediction. To measure the performance of DBRPNN in different situations, we divided the codes of distracted driving behaviors into two categories: the distracted driving behavior shown by the vehicle (103) and the distracted driving behavior shown by the driver (104). The comparative performance is shown in Table 8.

Table 8

Prediction performance for different categories of distracted driving behaviors.

Predicted Object	Category	Accuracy	Weighted-Recall
Vehicle	103	0.9229	0.9258
Vehicle	104	0.9083	0.9105
Area	103	0.9042	0.9078
Area	104	0.8875	0.8975

From the result, the prediction Accuracy of Category 104 in the area is relatively low, because the driver’s behavior is more random and the number of vehicles passing through the area fluctuates greatly, so it is difficult to predict. But in addition to this, DBRPNN has very high prediction results (accuracy rate greater than 0.9). During the prediction process, it was found that there were vehicles with all prediction results of level 0. The study found that this type of vehicle rarely had distracted driving behaviors, and the actual risk level was level 0 during the test time. The distracted driving behavior of this type of vehicle is difficult to measure by predicting the results, and every time a distracted driving behavior occurs, timely measures should be taken.

Application and implementation

In this study, we use the warning data of distracted driving behavior and the corresponding attribute data to construct a DBRPNN to realize the risk level prediction of different time granularities. Judging from the application and implementation of this method, this kind of forecasting work helps to improve the safety of road driving. Due to the bad driving habits of some drivers, multiple distracted driving behaviors will often occur during the driving process. If the driver is reminded every time a distracted driving behavior occurs, it will consume a lot of manpower and the effect is not good, but it should not be ignored. DBRPNN can accurately predict the risk level of each period of the vehicle. Reminders are given in advance in the period with higher risk level, and the driver is contacted in time when the vehicle distracted driving behaviors during the high-risk period, which can effectively prevent accidents and improve the active safety of transportation. Distracted driving behaviors that occur in vehicles or areas are closely related to accidents. Therefore, high-precision, variable time-granularity risk prediction has an irrefutable impact on road accident prevention and has a positive impact on the active safety of road transportation. The method proposed in this paper can predict the risk of driving behavior based on the distracted driving behavior data within a certain precision and accuracy range. Compared with some methods and experimental analysis, it has more practical and effective effects and can be a single vehicle or area that provides forecast results. In most cases, vehicles are managed by local transportation departments. In a local transportation department’s network, a relatively complete distracted driving behavior data collection system has been formed. Therefore, the method proposed in this paper can be transplanted to local active safety early warning systems to provide predictions on transportation safety. It will help local transportation departments to improve transportation safety, including reducing human resource investment and realizing active safety with higher efficiency. It can also predict the risk level of the area and inform in advance the vehicles that will pass the area during the high-risk period, thereby reducing the traffic risk in the area. In addition, we believe that another noteworthy problem of this work is to provide a network prediction solution based on spatiotemporal data. This method is also suitable for predictions with the same type of data, such as accident prediction. It also helps to build other types of prediction networks based on spatiotemporal data sets and attribute data.

Conclusion

This paper proposes an LSTM-based Driving Behavior Risk Prediction Neural Network (DBRPNN). Improve the accuracy of prediction by combining time attributes and vehicle attributes. Using the provincial proportional data set to train and test DBRPNN, the results show that DBRPNN has a stable and encouraging Accuracy, which can predict risk be based on different time intervals (30 minutes, 60 minutes, 90 minutes, 120 minutes) and different categories (Category 103, Category 104) with relatively high Accuracy. In addition, the implementation of DBRPNN has broad prospects, and artificial intelligence technology has once again demonstrated its power in the field of transportation. Researchers will continue to improve the Accuracy of DBRPNN and use other advanced neural networks to further study driving behavior risk prediction. 17 Nov 2021

PONE-D-21-34056

A Hybrid Neural Network for Driving Behavior Risk Prediction Based on Distracted Driving Behavior Data

PLOS ONE Dear Dr. Meng, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Jan 01 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Feng Chen Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Thank you for stating the following in the Acknowledgments Section of your manuscript: "The research is supported by Key R&D Project of the Ministry of Science and Technology of China (Granted No. 2020YFC1512004)." We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: "There are no financial conflicts of interest to disclose." Please include your amended statements within your cover letter; we will change the online submission form on your behalf. 3. We note that Figure 3 in your submission contain [map/satellite] images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright. We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission: a. You may seek permission from the original copyright holder of Figure 3 to publish the content specifically under the CC BY 4.0 license. We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text: “I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.” Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission. In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].” b. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only. The following resources for replacing copyrighted map figures may be helpful: USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/ The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/ Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/ Landsat: http://landsat.visibleearth.nasa.gov/ USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/# Natural Earth (public domain): http://www.naturalearthdata.com/ [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This study proposes a driving risk prediction algorithm based on a popular Recurrent Neural Network architecture, LSTM. The proposed algorithm is demonstrated by comparing with several selected baseline models. The research idea is excellent, but some concerns need to be addressed for the potential publication. The major comments are listed below. 1, In the study, driving risk is categorized into 0, 1, and 2. Please clearly introduce the definition of driving risk level in the manuscript. 2, Please introduce the input data of DBRPNN, and provide a sample data as the illustration. 3, To my understanding, the purpose of predicting driving behavior risk is to provide timely warning information for avoiding potential safety issues. Thus, the predicted information is valid for a short-term period. However, in the study, the authors present the prediction for 30 mins, 60 mins, 90 mins, and 120 mins. Please explain why the authors select these four time-intervals for performance demonstration. 4, Figure 3, “Shaanxi Province” should be a typo. Figure 1 should be replaced by a high-resolution figure to improve the readability. 5, The manuscript contains considerable language issues. Please conduct a proofreading and polish the manuscript accordingly. Reviewer #2: This manuscript presents an interesting and meaningful piece of research, a method to predict distracted driving behavior is proposed. I think with the increasingly rich data of the Internet of Things, more risk collection data can provide better early warning support. The paper is well written and I have the following comments: (1) The literature review is obviously not comprehensive enough. In terms of risk prediction, behavioral prediction and flow prediction have made abundant research progress, but they are not well reflected in the review section. (2) Why are risk levels divided into three categories in clustering? The paper is not clear on this point. (3) If the classification criteria of risk level are changed, will the accuracy of the results change? (4) "The driving behavior of this type of vehicle is difficult to measure by predicting The results, And every time a driving behavior occurs, timely measures should be taken." Why is that? (5) The reference in this study needs further expansion, several papers fousing on prediciton using deep learning modles may be useful for their further studies. [1]Spatiotemporal gated graph attention network for urban traffic flow prediction based on license plate recognition data”. Computer-Aided Civil and Infrastructure Engineering, 2021, 1-21. DOI: 10.1111/mice.12688. [2]Multi- community passenger demand prediction at region level based on spatio-temporal graph convolutional network”, Transportation Research Part C: Emerging Technologies, Vol.124, 102951, 2021, 1-18. I hope these questions can contribute to the further improvement of this manuscript. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 24 Dec 2021 Reply Letter Dear Editor, We quite appreciate your favorite consideration and the reviewers’ insightful comments concerning our manuscript entitled “A Hybrid Neural Network for Driving Behavior Risk Prediction Based on Distracted Driving Behavior Data”. Those comments are very valuable and helpful for improving the quality and readability of our paper, as well as the important guiding significance to our future researches. The modifications are marked in blue, red colors according to each reviewer, respectively in the revised manuscript and this reply letter. Moreover, detailed point-by-point responses to all of the comments from reviewers are provided. Here, a brief overview and summary of the revisions are shown as follows. Based on the comments from the Reviewers and Editor, the manuscript has been well revised. We hope this revised manuscript has addressed your concerns, and look forward to hearing from you. Sincerely, Hongwei Meng Response to Editor’s Comments Academic Editor’s Comments: Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Response: Thank you very much for the comments and the chance to improve our manuscript. We have checked the manuscript and corrected the content, the grammar, and the figures. The modifications corresponding to Reviewer 1’s comments are marked in blue, Reviewer 2's are marked in red in the revised manuscript and this reply letter. Detailed point-by-point responses to your comments are provided below. 1, Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. Response: We have made our manuscripts meet the style requirements of PLOS ONE, including file naming requirements. 2, Thank you for stating the following in the Acknowledgments Section of your manuscript: "The research is supported by Key R&D Project of the Ministry of Science and Technology of China (Granted No. 2020YFC1512004)." We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: "There are no financial conflicts of interest to disclose." Please include your amended statements within your cover letter; we will change the online submission form on your behalf. Response: Thank you very much. We have removed the funding information from the manuscript and have included our amendment statement in the cover letter. (3) We note that Figure 3 in your submission contain [map/satellite] images which may be copyrighted. Response: The map image in Figure 3 is generated from OSM and is open source. OpenStreetMap: https://www.openstreetmap.org/ Response to Reviewer #1’s Comments Comments to the Author: This study proposes a driving risk prediction algorithm based on a popular Recurrent Neural Network architecture, LSTM. The proposed algorithm is demonstrated by comparing with several selected baseline models. The research idea is excellent, but some concerns need to be addressed for the potential publication. The major comments are listed below. 1, In the study, driving risk is categorized into 0, 1, and 2. Please clearly introduce the definition of driving risk level in the manuscript. 2, Please introduce the input data of DBRPNN, and provide a sample data as the illustration. 3, To my understanding, the purpose of predicting driving behavior risk is to provide timely warning information for avoiding potential safety issues. Thus, the predicted information is valid for a short-term period. However, in the study, the authors present the prediction for 30 mins, 60 mins, 90 mins, and 120 mins. Please explain why the authors select these four time-intervals for performance demonstration. 4, Figure 3, “Shaanxi Province” should be a typo. Figure 1 should be replaced by a high-resolution figure to improve the readability. 5, The manuscript contains considerable language issues. Please conduct a proofreading and polish the manuscript accordingly. Response: We feel great thanks for your professional review work on our article. As you are concerned, there are several problems that need to be addressed. According to your comments, the manuscript has been revised. The modifications corresponding to your comments are marked in blue. Detailed point-by-point responses to your comments are provided below. 1, In the study, driving risk is categorized into 0, 1, and 2. Please clearly introduce the definition of driving risk level in the manuscript. Response: Thank you again for your positive comments and valuable suggestions to improve the quality of our manuscript. In our study, driving risk is divided into 0, 1, and 2 by K-means. Level 0 indicates that the vehicle or area is in a low-risk state at this time and no action is required. Level 1 indicates that the vehicle or area is in a medium-risk state and measures should be taken according to the situation. Level 2 indicates that the vehicle or area is in a high-risk state and immediate measures are required. We have added an introduction to the definition of driving risk level in the revised manuscript, shown as follows. Level 0 indicates that the vehicle or area is in a low-risk state at this time and no action is required. Level 1 indicates that the vehicle or area is in a medium-risk state and measures should be taken according to the situation. Level 2 indicates that the vehicle or area is in a high-risk state and immediate measures are required. 2, Please introduce the input data of DBRPNN, and provide a sample data as the illustration. Response: We are very grateful for your comments. It is necessary to display the input of DBRPNN, which will make our paper clearer. The input data of DBRPNN includes risk level, the time of a day, driving time, and driving mileage. The risk level is divided into 0, 1, and 2. The time of the day is divided into four time periods: 0: 00 to 6:00, 6:00 to 12:00, 12:00 to 18:00, 18:00 to 24:00, respectively, using 0, 1, 2, and 3 to indicate. The driving time is the length of time the driver drives the vehicle. The driving mileage is the distance the driver drives the vehicle. Driving time and driving mileage are used to measure the fatigue state of the driver. We provide sample data as an illustration in the revised manuscript, as shown below. The sample data is shown in Table 3. Table 3. The Input Data Structure of DBRPNN. Field Name Field Type Data Example Risk Level Int 0 The Time of a Day Int 3 Driving Time Float 8.77 Driving Mileage Float 318.2 3, To my understanding, the purpose of predicting driving behavior risk is to provide timely warning information for avoiding potential safety issues. Thus, the predicted information is valid for a short-term period. However, in the study, the authors present the prediction for 30 mins, 60 mins, 90 mins, and 120 mins. Please explain why the authors select these four time-intervals for performance demonstration. Response: Thank you for your comments. We think this is an excellent suggestion. Some similar research practices summarize the data into four different time intervals: 15 minutes interval, 30 minutes interval, 45 minutes interval, and 60 minutes interval. When we divide our data into 15 minutes interval, the number of periods during which distracted driving behavior occurs is less than 30% of the total number of periods. The data of the five vehicles with the largest number of distracted driving behaviors are shown in the table below. When there are too few non-zero data, predictability is low, so we choose a time interval of 30 minutes and above. Vehicle ID Number of Periods During Which Distracted Driving Behavior Occurred Total Number of Periods 16759 2513 8640 16924 2509 33028 2412 20635 2339 33024 2281 We have revised the corresponding content and supplemented the relevant references, shown as follows. To observe the prediction effects under different periods, this paper refers to some similar research practices [41, 42], combines the distracted driving behavior data according to vehicle ID or area ID, and summarizes it into four different time intervals: 30 minutes interval, 60 minutes interval, 90 minutes interval, and 120 minutes interval. When the time interval is less than 30 minutes, the scale of time units containing distracted driving behavior will be small. 42. Ma X, Yu H, Wang Y, et al. Large-Scale Transportation Network Congestion Evolution Prediction Using Deep Learning Theory. Plos One, 2015, 10. 4, Figure 3, “Shaanxi Province” should be a typo. Figure 1 should be replaced by a high-resolution figure to improve the readability. Thanks for your careful checks. The spelling of "aa" is indeed confusing. According to the English naming standard of Chinese cities, the English name of this province is "Shaanxi Province". In the revised manuscript, Figure 1 has been replaced by a high-resolution figure as follows. Fig 1. Driving Behavior Risk Prediction Neural-Network (DBRPNN) Architecture. 5, The manuscript contains considerable language issues. Please conduct a proofreading and polish the manuscript accordingly. We sincerely thank you for careful reading. We have tried our best to proofread and polish the revised manuscript. And here we did not list the changes but marked them in blue in the revised paper. Response to Reviewer #2’s Comments Comments to the Author: This manuscript presents an interesting and meaningful piece of research, a method to predict distracted driving behavior is proposed. I think with the increasingly rich data of the Internet of Things, more risk collection data can provide better early warning support. The paper is well written and I have the following comments: (1) The literature review is obviously not comprehensive enough. In terms of risk prediction, behavioral prediction and flow prediction have made abundant research progress, but they are not well reflected in the review section. (2) Why are risk levels divided into three categories in clustering? The paper is not clear on this point. (3) If the classification criteria of risk level are changed, will the accuracy of the results change? (4) "The driving behavior of this type of vehicle is difficult to measure by predicting The results, And every time a driving behavior occurs, timely measures should be taken." Why is that? (5) The reference in this study needs further expansion, several papers fousing on prediciton using deep learning modles may be useful for their further studies. [1]Spatiotemporal gated graph attention network for urban traffic flow prediction based on license plate recognition data”. Computer-Aided Civil and Infrastructure Engineering, 2021, 1-21. DOI: 10.1111/mice.12688. [2]Multi- community passenger demand prediction at region level based on spatio-temporal graph convolutional network”, Transportation Research Part C: Emerging Technologies, Vol.124, 102951, 2021, 1-18. Response: Thank you again for your positive comments and valuable suggestions to improve the quality of our manuscript. According to your comments, the manuscript has been revised. The modifications corresponding to your comments are marked in red. Detailed point-by-point responses to your comments are provided below. (1) The literature review is obviously not comprehensive enough. In terms of risk prediction, behavioral prediction and flow prediction have made abundant research progress, but they are not well reflected in the review section. Response: Thank you for your nice comments on our article. This makes our literature review more comprehensive. In Related Work, we introduce distracted driving behavior, methods for predicting distracted driving behavior and traffic accidents, and related neural networks. We have supplemented the content of relevant behavioral prediction and flow prediction in relevant work, shown as follows. In terms of risk prediction, behavioral prediction and flow prediction have made abundant research progress. Tang proposed a forecasting framework named the spatiotemporal gated graph attention network to predict the urban traffic flow based on license plate recognition data [19]. In addition, Pu uses historical data to predict road surface friction [20, 21]. Tang used a geographically weighted Poisson quantile regression model to study the spatial heterogeneity and estimated the spatial impact on crash frequency [22]. We have supplemented the relevant references, shown as follows. 19. Tang J, Zeng J. Spatiotemporal gated graph attention network for urban traffic flow prediction based on license plate recognition data. Computer-Aided Civil and Infrastructure Engineering, 2021(7). 20. Pu Z, Liu C, Shi X, et al. Road surface friction prediction using long short-term memory neural network based on historical data. Journal of Intelligent Transportation Systems, 2020(1):1-12. 21. Pu Z, Cui Z, Wang S, et al. Time-aware gated recurrent unit networks for forecasting road surface friction using historical data with missing values. IET Intelligent Transport Systems, 2020, 14(4):213-219. 22. Tang J, Gao F, Liu F, et al. Spatial heterogeneity analysis of macro-level crashes using geographically weighted Poisson quantile regression. Accident Analysis & Prevention, 2020, 148:105833. 29. Tang J, Liang J, Liu F, et al. Multi-community passenger demand prediction at region level based on spatio-temporal graph convolutional network. Transportation Research Part C Emerging Technologies, 2021, 124(10):102951. (2) Why are risk levels divided into three categories in clustering? The paper is not clear on this point. Response: We think this is an excellent suggestion. Thank you for pointing out our omissions. We use Silhouette Coefficient to evaluate the effect of K-means clustering. The results are shown in table 5. It can be seen from the table that the results are best when the number of clusters is three. Table 5. The Clustering Effect of K-means. Number of Clusters Silhouette Score 2 0.8193 3 0.8200 4 0.8033 5 0.8085 We have supplemented relevant content in the revised manuscript, shown as follows. This paper uses Silhouette Coefficient to evaluate the clustering effect of K-means, and the results of different values of k are shown in Table 5. Table 5. The Clustering Effect of K-means. Number of Clusters Silhouette Score 2 0.8193 3 0.8200 4 0.8033 5 0.8085 (3) If the classification criteria of risk level are changed, will the accuracy of the results change? Response: Thank you for your nice comments. If the classification criteria of risk level are changed, the accuracy of the results will change. The results are shown in the table below. Number of Risk Levels Accuracy Weighted-Recall 2 0.9167 0.9169 3 0.9146 0.9156 4 0.8458 0.8355 5 0.7562 0.7566 (4) "The driving behavior of this type of vehicle is difficult to measure by predicting The results, And every time a driving behavior occurs, timely measures should be taken." Why is that? Response: Thank you for your comment, and our reply is as follows. Distracted driving behavior of this type of vehicle rarely occurs. The prediction results of distracted driving behavior of some of these vehicles are shown in Fig 7. The distracted driving behavior of this type of vehicle is difficult to predict. So we need to remind them in time when they are distracted driving. (a) (b) Fig 7. The Prediction Result of the Risk Level. (a) Vehicle ID 16975. (b) Vehicle ID 16822. (5) The reference in this study needs further expansion, several papers fousing on prediciton using deep learning modles may be useful for their further studies. [1]Spatiotemporal gated graph attention network for urban traffic flow prediction based on license plate recognition data”. Computer-Aided Civil and Infrastructure Engineering, 2021, 1-21. DOI: 10.1111/mice.12688. [2]Multi- community passenger demand prediction at region level based on spatio-temporal graph convolutional network”, Transportation Research Part C: Emerging Technologies, Vol.124, 102951, 2021, 1-18. Response: We sincerely appreciate the valuable comments. These papers are very helpful for our current and further research. We have checked the literature carefully and added more references in the Related Work part in the revised manuscript, shown as follows. 19. Tang J, Zeng J. Spatiotemporal gated graph attention network for urban traffic flow prediction based on license plate recognition data. Computer-Aided Civil and Infrastructure Engineering, 2021(7). 20. Pu Z, Liu C, Shi X, et al. Road surface friction prediction using long short-term memory neural network based on historical data. Journal of Intelligent Transportation Systems, 2020(1):1-12. 21. Pu Z, Cui Z, Wang S, et al. Time-aware gated recurrent unit networks for forecasting road surface friction using historical data with missing values. IET Intelligent Transport Systems, 2020, 14(4):213-219. 22. Tang J, Gao F, Liu F, et al. Spatial heterogeneity analysis of macro-level crashes using geographically weighted Poisson quantile regression. Accident Analysis & Prevention, 2020, 148:105833. 29. Tang J, Liang J, Liu F, et al. Multi-community passenger demand prediction at region level based on spatio-temporal graph convolutional network. Transportation Research Part C Emerging Technologies, 2021, 124(10):102951. Submitted filename: Response to Reviewers.docx Click here for additional data file. 11 Jan 2022 A Hybrid Neural Network for Driving Behavior Risk Prediction Based on Distracted Driving Behavior Data PONE-D-21-34056R1 Dear Dr. Meng, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Feng Chen Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: (No Response) Reviewer #2: In this revision, authors responded all my concerning comments, and I think the quality of the paper has improved largely and it can be accepted for its current condition. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No 14 Jan 2022 PONE-D-21-34056R1 A Hybrid Neural Network for Driving Behavior Risk Prediction Based on Distracted Driving Behavior Data Dear Dr. Meng: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Feng Chen Academic Editor PLOS ONE

11 in total

1. Crash injury severity analysis using a two-layer Stacking framework.

Authors: Jinjun Tang; Jian Liang; Chunyang Han; Zhibin Li; Helai Huang
Journal: Accid Anal Prev Date: 2018-11-01

2. Explaining the road accident risk: weather effects.

Authors: Ruth Bergel-Hayat; Mohammed Debbarh; Constantinos Antoniou; George Yannis
Journal: Accid Anal Prev Date: 2013-04-01

3. Long short-term memory.

Authors: S Hochreiter; J Schmidhuber
Journal: Neural Comput Date: 1997-11-15 Impact factor: 2.026

4. Spatial heterogeneity analysis of macro-level crashes using geographically weighted Poisson quantile regression.

Authors: Jinjun Tang; Fan Gao; Fang Liu; Chunyang Han; Jaeyoung Lee
Journal: Accid Anal Prev Date: 2020-10-22

5. Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks.

Authors: Jack Hanson; Yuedong Yang; Kuldip Paliwal; Yaoqi Zhou
Journal: Bioinformatics Date: 2017-03-01 Impact factor: 6.937

6. Explicit and implicit self-enhancement biases in drivers and their relationship to driving violations and crash-risk optimism.

Authors: Niki Harré; Chris G Sibley
Journal: Accid Anal Prev Date: 2007-03-30

7. Prediction of Dangerous Driving Behavior Based on Vehicle Motion State and Passenger Feeling Using Cloud Model and Elman Neural Network.

Authors: Huaikun Xiang; Jiafeng Zhu; Guoyuan Liang; Yingjun Shen
Journal: Front Neurorobot Date: 2021-04-29 Impact factor: 2.650

8. Learning Traffic as Images: A Deep Convolutional Neural Network for Large-Scale Transportation Network Speed Prediction.

Authors: Xiaolei Ma; Zhuang Dai; Zhengbing He; Jihui Ma; Yong Wang; Yunpeng Wang
Journal: Sensors (Basel) Date: 2017-04-10 Impact factor: 3.576

9. A Driver's Physiology Sensor-Based Driving Risk Prediction Method for Lane-Changing Process Using Hidden Markov Model.

Authors: Yan Li; Fan Wang; Hui Ke; Li-Li Wang; Cheng-Cheng Xu
Journal: Sensors (Basel) Date: 2019-06-13 Impact factor: 3.576

10. Why do drivers become safer over the first three months of driving? A longitudinal qualitative study.

Authors: Marianne R Day; Andrew R Thompson; Damian R Poulter; Christopher B Stride; Richard Rowe
Journal: Accid Anal Prev Date: 2018-04-30