| Literature DB >> 34434645 |
Hui Yuan Tan1, Zhi Yun Goh1, Kar-Hoe Loh2, Amy Yee-Hui Then3, Hasmahzaiti Omar3,4, Siow-Wee Chang1.
Abstract
BACKGROUND: Despite the high commercial fisheries value and ecological importance as prey item for higher marine predators, very limited taxonomic work has been done on cephalopods in Malaysia. Due to the soft-bodied nature of cephalopods, the identification of cephalopod species based on the beak hard parts can be more reliable and useful than conventional body morphology. Since the traditional method for species classification was time-consuming, this study aimed to develop an automated identification model that can identify cephalopod species based on beak images.Entities:
Keywords: Beaks; Cephalopod; Deep features; Deep learning; Machine learning; Species identification; Traditional morphometric features
Year: 2021 PMID: 34434645 PMCID: PMC8359798 DOI: 10.7717/peerj.11825
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1The framework for cephalopod species identification using integrated analysis of machine learning and deep learning.
Figure 2Set up for the image acquisition for beak samples of the studied cephalopods: (A) Lightbox; (B) the smartphone was used to capture the photos of the beaks.
List of morphological features.
| Features | Definition | Formula |
|---|---|---|
| Area | Size of the beak | – |
| Perimeter | The length of the contour of the beak | – |
| Aspect Ratio | The ratio of major axis length over minor axis length |
|
| Extent | The proportion of pixels in the bounding box that also contains the beak. |
|
| Solidity / convexity | The proportion of the pixels in the convex hull that also contains beak. |
|
| Equivalent Diameter | The diameter of a circle with the same area as the beak |
|
| Circularity | The ratio of the area of the beak to the convex circle |
|
| Rectangularity | The ratio of the beak to the area of the minimum bounding rectangle |
|
| Form Factor | The ratio of the area of the beak to the circle |
|
| Narrow Factor | The ratio of the diameter of the beak to the height of the beak |
|
List of parameters adjusted for each classifier.
|
|
|
|---|---|
| ANN | Hidden layer sizes = 1 layer and 30 hidden neurons, Learning rate schedule for weight updates = 0.001, Maximum number of iteration = 200, Weight optimization = stochastic gradient-based optimizer |
| SVM | |
| RF | Number of trees = 100, Function in measuring the quality of split = ‘Gini impurity’, Maximum number of features = sqrt(number of features) |
| DT | Function in measuring the quality of split = Information gain, Maximum depth of the tree = 2 |
| kNN | Distance metric = minkowski (standard Euclidean metric), Number of neighbors = 8, Weights function = uniform weights |
| LR | |
| LDA | |
| GNB | Largest variance for calculation stability = 1e−09 |
Notes.
regularization parameter based on the squared l2 penalty, the smaller the strong regularization power
tolerance for stopping criteria
Figure 3Example of stratified shuffle split cross-validation approach for one of the ANN models.
List of the Cephalopod Species Collected.
Notes.
Images are not to scale.
Seven cephalopod species with GenBank accession number.
| Species | Sample code | Sequence ID | GenBank accession number |
|---|---|---|---|
|
| C2-1 | SeqC2-1 | MZ413930 |
|
| C6-25 | SeqC6-25 | MZ413931 |
|
| C3-1 | SeqC3-1 | MZ413932 |
|
| S1-1 | SeqS1-1 | MZ413933 |
|
| S3-1 | SeqS3-1 | MZ413934 |
|
| S4-1 | SeqS4-1 | MZ413935 |
|
| O2-6 | SeqO2-6 | MZ413936 |
Number of traditional features and deep features extracted.
|
|
|
|---|---|
| Gray HOG | 108 |
| Colour HOG | 108 |
| MSD | 10 |
| Gray HOG + MSD | 118 |
| Colour HOG + MSD | 118 |
| VGG19 | 4096 |
| ResNet50 | 2048 |
| InceptionV3 | 2048 |
Performance for single and hybrid descriptor of traditional features.
|
|
| |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
| ||||||
|
|
|
|
|
|
|
|
|
|
| |
|
| 54.34(0.59) | 52.69(0.59) | 60.40(0.62) |
| 55.43(0.56) | 44.91(0.51) |
| 58.06(0.65) | 64.91(0.70) |
|
|
| 46.51(0.56) | 45.31(0.56) | 51.66(0.59) | 66.69(0.74) | 62.17(0.66) | 52.06(0.58) | 52.97(0.61) | 51.14(0.61) | 59.34(0.62) | 68.23(0.77) |
|
| 57.77(0.64) | 51.20(0.57) | 66.06(0.74) | 71.03(0.76) | 64.57(0.69) | 63.66(0.68) | 77.03(0.84) | 65.03(0.72) | 79.14(087) | 76.91(0.81) |
|
|
| 48.74(0.55) | 53.77(0.62) | 71.14(0.75) | 61.20(0.66) | 50.63(0.58) | 59.31(0.67) | 50.86(0.58) | 62.69(0.69) | 70.40(0.74) |
|
| 42.40(0.55) | 36.86(0.49) | 45.77(0.61) | 40.29(0.53) | 45.71(0.61) | 46.46(0.63) | 43.31(0.57) | 48.69(0.65) | 44.97(0.62) | 53.14(0.67) |
|
| 50.91(0.58) | 50.63(0.58) | 55.94(0.61) | 65.26(0.75) | 42.00(0.55) | 41.03(0.51) | 60.06(0.66) | 56.63(0.64) | 62.34(0.69) | 69.89(0.78) |
|
| 57.14(0.58) | 45.83(0.41) | 64.80(0.67) | 61.89(0.61) | 62.74(0.66) | 59.26(0.60) | 74.34(0.74) | 58.23(0.54) | 76.74(0.80) | 64.34(0.62) |
|
| 41.71(0.51) | 44.97(0.48) | 48.17(0.57) | 53.54(0.57) |
| 52.74(0.60) | 51.66(0.58) | 50.06(0.53) | 54.97(0.62) | 57.49(0.61) |
Notes.
Average testing accuracy from the five-fold CV results with 10 times runs.
Upper beak
Lower beak
Average area under the precision–recall curve in one of the runs
Bolded text indicated the best results for each traditional feature model (RF and LDA models showed overfitting as testing accuracy was much lower than the training accuracy).
Figure 4Performance evaluation from one of the runs in the ANN model with hybrid descriptor (colour HOG +MSD) of lower beak images: (A) confusion matrix; (B) precision-recall curve.
For the confusion matrix, the precision and recall value of the identification model was computed from the testing set. Each cephalopod species was computed for its precision and recall values to visualize the differences in the performance of the model. The average precision–recall curve of the model was calculated. For the Precision-Recall curve, the area under the curve was measured. The higher the area under the curve, the better the model performance in identifying cephalopod species from the beak images.
Performance of eight classifiers with deep features extracted.
|
|
| |||||
|---|---|---|---|---|---|---|
|
|
|
| ||||
|
|
|
|
|
|
| |
|
| 88.63(0.95) |
|
| 87.49(0.94) |
| 85.77(0.91) |
|
| 81.94(0.11) | 88.57(0.94) | 81.54(0.89) | 84.57(0.91) | 79.03(0.90) | 83.89(0.92) |
|
| 85.31(0.93) | 89.83(0.95) | 84.29(0.90) | 82.97(0.90) | 85.43(0.93) | 83.31(0.90) |
|
| 76.97(0.87) | 85.60(0.93) | 76.00(0.85) | 81.09(0.88) | 76.74(0.86) | 79.31(0.87) |
|
| 58.63(0.73) | 56.63(0.71) | 49.37(0.66) | 50.63(0.65) | 48.11(0.66) | 53.20(0.69) |
|
| 89.66(0.96) | 91.71(0.95) | 85.77(0.95) | 85.37(0.92) | 84.86(0.93) | 85.83(0.92) |
|
| 83.14(0.88) | 86.34(0.91) | 88.86(0.95) | 89.43(0.95) | 82.11(0.90) | 82.34(0.89) |
|
| 82.23(0.85) | 86.97(0.89) | 77.54(0.82) | 77.60(0.83) | 67.60(0.72) | 70.11(0.75) |
Notes.
Average testing accuracy from the five-fold CV results with 10 times runs.
Upper beak
Lower beak
Average area under the precision–recall curve in one of the runs
Bolded text indicated the best results for each traditional feature model (RF and LDA models showed overfitting as testing accuracy was much lower than the training accuracy).
Figure 5Performance evaluation from one of the runs in the VGG19-ANN model of lower beak images: (A) confusion matrix; (B) precision-recall curve.
For the confusion matrix, the precision and recall value of the identification model was computed from the testing set. Each cephalopod species was computed for its precision and recall values to visualize the differences in the performance of the model. The average precision–recall curve of the model was calculated.For the Precision-Recall curve, the area under the curve was measured. The higher the area under the curve, the better the model performance in identifying cephalopod species from the beak images.
Comparison of previous and current study.
|
|
|
|
| |
|---|---|---|---|---|
|
|
| |||
|
| Seven specimens of | Texton-based (mottle and pattern) of cuttlefish | SVM | Best accuracy = 94% |
|
| 50 squid species | Size measurements of mantle, fin and head | ANN | Best accuracy = 98.6% |
|
| Three Loliginidae Squid Species (256 samples) | Extract feature from statolith and beak | Geometric outline with PCA and SDA | Accuracies between 75.0%–88.7% |
| Current study | Seven cephalopod species (174 samples) | Traditional features and deep features of upper and lower beaks | ANN, SVM, RF, kNN, DT, LR, LDA, GNB | Best accuracy = 91.14%. |