| Literature DB >> 35898098 |
Yashar Tavakoli1, Lourdes Peña-Castillo1, Amilcar Soares1.
Abstract
The classification of ships based on their trajectory descriptors is a common practice that is helpful in various contexts, such as maritime security and traffic management. For the most part, the descriptors are either geometric, which capture the shape of a ship's trajectory, or kinematic, which capture the motion properties of a ship's movement. Understanding the implications of the type of descriptor that is used in classification is important for feature engineering and model interpretation. However, this matter has not yet been deeply studied. This article contributes to feature engineering within this field by introducing proper similarity measures between the descriptors and defining sound benchmark classifiers, based on which we compared the predictive performance of geometric and kinematic descriptors. The performance profiles of geometric and kinematic descriptors, along with several standard tools in interpretable machine learning, helped us to provide an account of how different ships differ in movement. Our results indicated that the predictive performance of geometric and kinematic descriptors varied greatly, depending on the classification problem at hand. We also showed that the movement of certain ship classes solely differed geometrically while some other classes differed kinematically and that this difference could be formulated in simple terms. On the other hand, the movement characteristics of some other ship classes could not be delineated along these lines and were more complicated to express. Finally, this study verified the conjecture that the geometric-kinematic taxonomy could be further developed as a tool for more accessible feature selection.Entities:
Keywords: classification; descriptor; feature engineering; feature selection; knowledge discovery; model interpretation; ship; trajectory
Mesh:
Year: 2022 PMID: 35898098 PMCID: PMC9329964 DOI: 10.3390/s22155588
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1The trajectories that were contained in the raw data according to class, as portrayed on their respective regions (note the invalid/glitchy trajectories that were later removed): (a) cargo ships; (b) fishing ships; (c) passenger ships; (d) sailing ships; (e) tankers; (f) towing ships.
A breakdown of the number of AIS messages and the number of trajectories according to class.
| Ship Type | Number of AIS Messages | Number of Trajectories |
|---|---|---|
| Cargo | 5,778,529 | 933 |
| Tanker | 3,313,088 | 731 |
| Towing | 14,460,834 | 1010 |
| Fishing | 14,385,974 | 1025 |
| Passenger | 3,935,171 | 753 |
| Sailing | 1,343,775 | 1027 |
| Total | 43,217,371 | 5479 |
The list of geometric descriptors that were used in this study.
| Descriptor | Identifier | Comment |
|---|---|---|
| Sinuosity |
| |
| Distance geometries | A group of 1 + 2 + 3 + 4 + 5 = 15 descriptors that measure tortuosity as the effective distance (the ratio of the distance between the start and end points of a segment to the length of the segment); each term of the summation, which we call a signature, measures the tortuosity of the trajectory in progressively finer frequencies and then, the signatures together define the shape of a trajectory; therefore, the first signature consists of one descriptor and is the effective length of the entire trajectory, the second signature consists of two descriptors (the first being the effective length of the first segment and the second being the effective distance of the second segment), etc.; although the authors of [ | |
| Distance |
| Distance between the start and end points of trajectory |
| Maximum expected displacement of trajectory |
| A dimensionless and scale-independent measure of trajectory straightness, as proposed by [ |
| Expected displacement of trajectory |
| Values closer to 0 indicate a lower density of turning angles, while larger values (approaching infinity) indicate a higher density of turning angles [ |
| Length of trajectory |
| The cumulative distance traveled along trajectory |
| Sum of absolute values of trajectory angles |
| Values closer to 0 indicate small course variations, while larger values (approaching infinity) indicate larger course variations |
| Sum of trajectory angles |
| The sum of trajectory angles at each step |
| Proportion of small angles to total number of angles |
| Small angles constitute angles between |
| Proportion of medium angles to total number of angles |
| Medium angles constitute angles between |
| Proportion of large angles to total number of angles |
| Large angles constitute angles between |
| Proportion of reverse angles to total number of angles |
| Reverse angles constitute angles between |
| Perimeter of convex hull of trajectory |
| Larger values imply longer trajectories; the convex hull of a trajectory is the smallest convex polygon within the trajectory |
| Area of convex hull of trajectory |
| Larger values imply that the trajectory deviates more from the shortest path between the start point and end point of the trajectory; the convex hull of a trajectory is the smallest convex polygon within the trajectory |
| Ratio of shortest to longest axis of convex hull of trajectory |
| The distance between the centroid of the convex hull and the nearest point on the convex hull and the longest axis is the distance between the centroid of the convex hull and the farthest vertex of the convex hull; smaller values imply more stretched out convex hulls |
| Orientation of convex hull of trajectory (with reference to hull’s longest axis) |
| The longest axis is the distance between the centroid of the convex hull and the farthest vertex of the convex hull; we regarded the supplementary angles (adding up to |
The list of kinematic descriptors that were used in this study.
| Descriptor | Identifier | Comment |
|---|---|---|
| Average speed of turning |
| Average speed of the ship when it is turning |
| Maximum speed of turning |
| Maximum speed of the ship when it is turning |
| Average speed of straight sailing |
| Average speed of the ship when it is sailing straight |
| Maximum speed of straight sailing |
| Maximum speed of the ship when it is sailing straight |
| Maximum rate of turn |
| Rate as degrees per minute |
| Average rate of turn |
| Rate as degrees per minute |
| Proportion of trajectory in which the ship is turning with respect to entire trajectory |
| A value between 0 and 1 that indicates the ratio of accumulative duration of segments in which the ship is turning to entire duration of trajectory |
| Proportion of trajectory in which the ship is moving at up to 4 knots with respect to entire trajectory |
| Based on [ |
| Proportion of trajectory in which the ship is moving at 4 to 10 knots with respect to entire trajectory |
| Based on [ |
| Proportion of trajectory in which the ship is moving at 10 to 18 knots with respect to entire trajectory |
| Based on [ |
| Proportion of trajectory in which the ship is moving at more than 18 knots |
| Based on [ |
| Number of anchored off segments |
| An integer greater than or equal to 0 that counts the number of trajectory segments that are classed as anchored off; this descriptor can appear to be geometric at the first sight, but in the case of free-floating vessels, kinematic parameters determine whether the vessel is anchored off; in accordance with [ |
| Total time of anchored off segments |
| A value that is greater than or equal to 0, which is the sum of all duration values of each trajectory segment that are classed as anchored off; following the same argument as that presented for the previous descriptor, this descriptor is essentially kinematic in nature |
Figure 2The performance profiles of the random forest models that were based on the 3 different sets of predictors for each of the 57 classification problems. The problems were sorted based on the OOB errors of the models that were based on both geometric and kinematic predictors.
Figure 3The numbers of classes that were present in the problems, which were labeled and sorted according to increasing levels of hardness. The classification problems that involved more classes tended to be harder, with some exceptions.
The descriptive statistics of the geometric descriptor-based, kinematic descriptor-based, and benchmark models, according to the classification problems.
| Row No. | Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | Description |
|---|---|---|---|---|---|---|---|
| 1 | 0.0086 | 0.0387 | 0.0493 | 0.0512 | 0.0667 | 0.0945 | The OOB errors produced by Geometric models minus those produced by the benchmark models. |
| 2 | 0.0015 | 0.0309 | 0.0456 | 0.0452 | 0.0630 | 0.0760 | The OOB errors produced by Kinematic models minus those produced by the benchmark models. |
| 3 | 0.0214 | 0.0418 | 0.0545 | 0.0558 | 0.0693 | 0.0945 | The OOB errors produced by the best alternative model (either Kinematic or Geometric) minus those produced by the benchmark models. |
| 4 | −0.0272 | −0.0055 | 0.0069 | 0.0060 | 0.0184 | 0.0395 | The OOB errors produced by the Geometric models minus those produced by the Kinematic models. |
Figure 4The relative performances of the geometric and kinematic descriptor-based models with reference to the corresponding benchmark models for binary problems. The outer ring depicts the performance of the geometric descriptor-based models and the inner ring depicts the kinematic descriptor-based models.
Figure 5(a) The silhouette scores for the different numbers of clusters from the similarity matrix that consisted of the geometric descriptors; (b) a kernel density plot of the sinuosity and length of the similarity matrix that consisted of the geometric descriptors.
Figure 6The hierarchical clustering of the similarity matrix that consisted of the geometric descriptors. The similarity index for a pair of descriptors was between 0 and 1.
The descriptive statistics of the similarity between the distance geometries of the cargo ships vs. passenger ships problem.
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
|---|---|---|---|---|---|
| 0.5700 | 0.6736 | 0.7231 | 0.7308 | 0.7636 | 0.9443 |
Figure 7(a) The PDP of distance_1_1 and length for geometric descriptor-based models; (b) the silhouette scores from the different numbers of clusters for the kinematic descriptor-based models.
Figure 8The hierarchical clustering of the similarity matrix for the kinematic descriptor-based models. The similarity index for a pair of descriptors was between 0 and 1.
Figure 9A portrayal (with a 5% smoothing) of the weak monotone relationships between the pairs of speed proportion descriptors in the kinematic descriptor-based model.
Figure 10(a) The PDP of maximum_speed_straight and medium_low_speed_proportion for the kinematic descriptor-based model; (b) the silhouette scores from the different numbers of clusters for the hybrid problem.
Figure 11The hierarchical clustering of the similarity matrix for the hybrid problem: geometric predictors; kinematic predictors. The similarity index for a pair of descriptors was between 0 and 1.
The descriptive statistics of the similarity bonds within the clusters of the hybrid problem.
| Row No. | Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | Description |
|---|---|---|---|---|---|---|---|
| 1 | 0.7764 | 0.8159 | 0.8837 | 0.8763 | 0.9325 | 0.9754 | Similarities pertaining to the first cluster of the hybrid problem. |
| 2 | 0.6791 | 0.6799 | 0.7036 | 0.7286 | 0.7758 | 0.8296 | Similarities pertaining to the second cluster of the hybrid problem. |
| 3 | 0.4843 | 0.5600 | 0.6201 | 0.6511 | 0.7270 | 0.9556 | Similarities pertaining to the third cluster of the hybrid problem. |
| 4 | 0.4842 | 0.6628 | 0.6907 | 0.7221 | 0.8246 | 0.9678 | Similarities pertaining to the fourth cluster of the hybrid problem. |
| 5 | 0.2795 | 0.6116 | 0.6800 | 0.6560 | 0.7422 | 0.9425 | Similarities pertaining to the fifth cluster of the hybrid problem. |
Figure 12(a) The H test results for the predictors across all classes; (b) the mean decrease in accuracy (MDA) values for the predictors across all classes.
Figure 13(a) The two-way H test results for chull_perimeter across all classes; (b) the two-way H test results for maximum_speed_straight across all classes.
Figure 14The 2D multidimensional scaling of the descriptors using data points that belonged to each: (a) cargo ships; (b) fishing ships; (c) passenger ships; (d) sailing ships; (e) tankers; (f) towing ships.
Figure 15The verification of the conjecture regarding universal similarity. Each dot shows the average strength of the similarity bonds between several groups of predictors for each of the 57 problems.