Literature DB >> 33409141

Comparison of data mining algorithms for sex determination based on mastoid process measurements using cone-beam computed tomography.

Maryam Farhadian1, Fatemeh Salemi2, Abbas Shokri2, Yaser Safi3, Shahin Rahimpanah4.   

Abstract

PURPOSE: The mastoid region is ideal for studying sexual dimorphism due to its anatomical position at the base of the skull. This study aimed to determine sex in the Iranian population based on measurements of the mastoid process using different data mining algorithms.
MATERIALS AND METHODS: This retrospective study was conducted on 190 3-dimensional cone-beam computed tomographic (CBCT) images of 105 women and 85 men between the ages of 18 and 70 years. On each CBCT scan, the following 9 landmarks were measured: the distance between the porion and the mastoidale; the mastoid length, height, and width; the distance between the mastoidale and the mastoid incision; the intermastoid distance (IMD); the distance between the lowest point of the mastoid triangle and the most prominent convex surface of the mastoid (MF); the distance between the most prominent convex mastoid point (IMSLD); and the intersecting angle drawn from the most prominent right and left mastoid point (MMCA). Several predictive models were constructed and their accuracy was compared using cross-validation.
RESULTS: The results of the t-test revealed a statistically significant difference between the sexes in all variables except MF and MMCA. The random forest model, with an accuracy of 97.0%, had the best performance in predicting sex. The IMSLD and IMD made the largest contributions to predicting sex, while the MMCA variable had the least significant role.
CONCLUSION: These results show the possibility of developing an accurate tool using data mining algorithms for sex determination in the forensic framework.
Copyright © 2020 by Korean Academy of Oral and Maxillofacial Radiology.

Entities:  

Keywords:  Cone-Beam Computed Tomography; Data Mining; Mastoid; Sex Determination Analysis

Year:  2020        PMID: 33409141      PMCID: PMC7758270          DOI: 10.5624/isd.2020.50.4.323

Source DB:  PubMed          Journal:  Imaging Sci Dent        ISSN: 2233-7822


Introduction

Forensic odontology is a branch of forensic medicine involving the application of dentistry to legal matters.1 Forensic dentistry can present effective guidance for clarifying some of the riddles and problems in the field of forensics. This knowledge can be used to identify criminals, burned and broken bodies, and multiple bodies; furthermore, it can be used in other related areas.23 It is challenging to identify the sex of victims following severe events such as natural disasters, wars, or air accidents. Accordingly, sex determination is an essential step in the process of forensic identification. During forensic identification, it can be difficult to determine the sex of some bodies.45 Various parts of the body, including the face, teeth, skull, and fingerprints are used in forensic analysis.678910 There are structural/morphological differences between males and females in most parts of the body structure. The mastoid region, as a piece of the skull that is resistant to injury due to its anatomical position at the base of the skull, is ideal for the study of sexual dimorphism.1112 Several previous studies have shown that the mastoid process is a useful cranial region for determining sex. The tip of the mastoid process is vertical in males and faces inward in women. The mastoid region is of particular interest from the perspective of the macroscopic identification of sex using bones.1314151617181920 Data mining is a computational process for discovering patterns in large datasets through techniques such as artificial intelligence, machine learning, and statistical analyses. The goal of this process is to extract information from a dataset for further use. Data mining involves techniques such as clustering, association, pattern recognition, and classification. Data mining can be practiced effectively for the rapid and low-cost prediction and diagnosis of diseases.2122 Machine learning algorithms have been used to mine medical data and produce predictive models. These computer algorithms can learn and predict unobserved data. Machine learning algorithms help forensic teams in a variety of ways, such as individual identification, forensic criminology, computer forensics, and forensic cybersecurity to solve or prevent crimes.2324 Machine learning can be applied for purposes such as personal identification, forensic cybersecurity, forensic medicine, and forensic crime to prevent and solve various problems in the field of forensics.25 Since sex determination based on certain parameters is similar to a classification problem in the domain of machine learning classification algorithms, using these algorithms can assist forensic analysts in determining sex.25 Classification techniques are among the most common methods of learning in data mining for prediction purposes. Predictive methods use the values of some predictors or inputs to predict the value of a given response or output variable. Classification refers to the process of finding a model to specify the class of objects according to their properties.21 Various studies have highlighted the benefits of using data mining in the field of medicine and dentistry.262728 However, to our knowledge, only a limited number of studies have used data mining methods in the forensic field to determine age and sex.325 For example, Awais et al.25 compared the use of various machine learning algorithms, such as naïve Bayes, J48, random forest, random tree, and REP tree methods, for sex identification in the population of Punjab in Pakistan based on footprint dimensions. Farhadian et al.3 used neural networks to estimate age using the pulp-to-tooth ratio in canines. As skull measurements differ across various ethnic groups, and no studies have been conducted on mastoid measurements for sex determination in the Iranian population, the purpose of this study was to determine sex in the Iranian population based on measurements of the dimensions and convexity and internal angles of the mastoid process using different data mining algorithms.

Materials and Methods

This retrospective study was conducted on 190 3-dimensional (3D) cone-beam computed tomography (CBCT) images of 105 women and 85 men referred to the Department of Oral and Maxillofacial Radiology of Hamadan Dental School. The present study was approved by the institutional review board (IR.UMSHA.REC.1397.715). Subjects' age ranged from 18 to 70 years. The inclusion criterion was all CBCT images taken for implant placement and other therapeutic purposes, while the exclusion criteria were images with severe artifacts, images that did not show anatomical details of the mastoid, and patients with maxillofacial anomalies. All CBCT images were taken using a NewTom 3G apparatus (NewTom, Verona, Italy) with a kilovoltage peak of 110 kVp, a tube current of 3.8 mA, an exposure time of 6.3 s, a 180-µm voxel size and a 12-inch field of view, and were stored in NNTviewer software. Initially, CBCT images were converted to the Digital Imaging and Communications in Medicine (DICOM) format and transferred to On-Demand 3D dental software (version: 1.0.5385, Cybermed Inc., Seoul, Korea), and 3D images were created using the 3D software option. On each CBCT image, the following 5 anatomical landmarks were identified on both the right and left sides: 1) porion: the highest point at the upper margin of the external auditory meatus, 2) incisura mastoidea: the mastoid incision lying on the inferior-medial mastoid, 3) mastoidale: the lowest craniometric point of the mastoid process, 4) the most prominent point on the lateral surface of the convex mastoid triangle, and 5) the highest point at the internal surface of the mastoid (within the digastric cavity). Accordingly, 8 linear measurements were made using the linear measurement tool in the software and 1 angular measurement was made using the angular measurement tool in each 3-dimensional image, as follows: 1-porion-mastoidale (PM): the linear distance between the porion and the mastoidale in the lateral view (Fig. 1), 2- mastoidale-incisura mastoidea (MI): The distance between the incisura mastoidea and the mastoidale in the lateral view (Fig. 1), 3-mastoid length (ML): the distance between the porion and the posterior end of the incisura mastoidea in the lateral view (Fig. 1), 4-mastoid height (MH): the perpendicular line of the mastoidale on the line between the porion and the incisura mastoidea in the lateral view (Fig. 2), 5-mastoid width (MW): the distance between the most prominent point on the lateral surface of the convex mastoid triangle and the highest point on the inner surface of the mastoid triangle (within the digastric cavity) in the inferior view (Fig. 3), 6-mastoid flare (MF): the distance between the lowest point of the mastoid triangle and the most prominent convex surface of the mastoid in the posterior view (Fig. 4), 7-intermastoid distance (IMD): the distance between the lowest point of the right and left mastoid triangles in the posterior view (Fig. 4), 8-intermastoid lateral surface distance (IMLSD): the distance between the most prominent convex surfaces points of the right and left mastoid triangles in the posterior view (Fig. 4), and mastoid medial convergence angle (MMCA): the angle between 2 lines drawn from the most prominent right and left mastoid points in the posterior view (Fig. 4).
Fig. 1

Mastoid length (ML) and mastoid height (MH) measurements on a 3-dimensional cone-beam computed tomographic image, created using the linear measurement tool in the OnDemand software.

Fig. 2

Mastoid length (ML), the distance between the porion and the mastoidale (PM) and the distance between the mastoidale and the mastoid incision (MI) measurements on a 3-dimensional cone-beam computed tomographic image, created using the linear measurement tool in the OnDemand software.

Fig. 3

The mastoid width (MW) is indicated on the left and right sides on a 3-dimensional cone-beam computed tomographic image, created using the linear measurement tool in the OnDemand software.

Fig. 4

Mastoidale flare (MF), intermastoidale distance (IMD), intermastoidale lateral surface distance (IMLSD), and mastoidale medial convergence angle (MMCA) measurements on a 3-dimensional cone-beam computed tomographic image, created using the linear measurement tool in OnDemand software.

Two experienced oral and maxillofacial radiologists performed all measurements within 2 weeks. Due to the high agreement between and within the 2 observers according to the intraclass correlation coefficient (ICC), measurements from 1 observer were used to generate the classification models. Since no significant differences were found between the left-side and right-side measurements, the rightside measurements were used to develop the classification models. In each classification model, the 9 measurements mentioned above were used as inputs and the individual's sex was the output variable. R version 3.4.1 (R Foundation for Statistical Computing, Vienna, Austria) was used for the analysis. The current study aimed to use different data mining approaches to determine sex based on the dimensions and convexity and internal angles of mastoid process measurements. Accordingly, various predictive models were made and their predictive performance was compared. The different classifiers used in this study were support vector machine, neural networks, naive Bayes, random forest, k-nearest neighbor, linear discriminant analysis, and logistic regression. The methods are briefly described below.

K-nearest neighbor

The k-nearest neighbor method is based on a distance function for pairs of observations, such as the Euclidean distance or 1 minus the correlation coefficient. For each element in the test set, the k closest observations are found in the training set, and the class is predicted by a majority vote (i.e., choosing the class that is most common among those k neighbors). In this study, the number of neighbors (k) for the nearest neighbor predictor was selected by cross-validation.21

Neural networks

Artificial neural networks (ANNs) consist of several interconnected neurons that are positioned in at least 3 layers (input, hidden, and output). Learning in ANNs occurs by adjusting the weights of the connections between nodes of subsequent layers. The multilayer perceptron is the most commonly used feed-forward neural network, in which the output of neurons is calculated by applying a nonlinear activation function. The functional form of the multilayer perceptron can be written as follows: Where X denotes the i-th value in the previous layer for input variables, W denotes the weights, b is the bias, and y is the output variable in the present layer, while F can be considered as any activation function used in the present layer. In this study, the number of neurons in the hidden layers was chosen by cross-validation.21

Naive Bayes

A naive Bayes classifier is a probabilistic classifier based on applying Bayes' theorem. This classifier assigns a new observation to the most probable class. Let x=[x1, x2, … , x]∈ R be the input vector, whose class label is unknown. According to Bayes' theorem, x is assigned to the class y if: P (y | x)>P (y | x) ∀ i ≠ j, where P (y | x) is a posteriori probability of class y given the vector of x.21

Random forest

Random forest is an algorithm for classification that uses an ensemble of classification trees. Each of the classification trees is built using a bootstrap sample of the data, and at each split, the candidate set of variables is a random subset of the variables. The outputs of all trees are aggregated to produce a single final classification; specifically, the object belongs to the class with the majority of predictions given by the trees in the random forest.29

Logistic regression

Logistic regression, as a probabilistic statistical model, measures the relationship between a categorical output variable and one or more independent variables. The logistic function can be written as: Where π, is the probability of observation to the first category of the output variable (y), X is the input variable, and β is the regression coefficient estimated by the model for this variable21.

Linear discriminant analysis

Linear discriminant analysis is a dimensionality reduction method used to find a linear combination of input variables that provides the most separation between 2 or more classes with the best performance. These discriminant functions are linear with respect to the characteristic vector, and usually have the following form: Where w represents the weight vector, x the characteristic vector, and b0 a threshold. Discriminant weights (w) are estimated by ordinary least squares, so that the ratio of the variance within the k groups to the variance between the k groups is minimal.21

Support vector machine

The support vector machine method is a supervised learning technique that attempts to find the hyperplanes that produce the largest separation between the decision function values for the instances located on the borderline between the 2 classes. In a binary classification model, given a training set of instance-label pairs (x, y) i=1,2, …, N where x∈R and y∈{−1, +1}, the support vector machine can be regarded as the solution of the following quadratic optimization problem: Where the training data are mapped to a higher dimensional space by the function φ and C is a user-defined penalty parameter on the training error that controls the trade-off between classification errors and the complexity of the model. By finding the parameters w and b for a given training set, the decision function can be formulated as follows: The support vector machine technique can derive the optimal hyperplane for nonlinearly separated data with mapping the imputation data into an n-dimensional space using the kernel function (K(x, x)=φ(x)T φ(x)). There are 4 basic kernels: linear, polynomial, radial basic function, and sigmoid. The tuning parameters for support vector machines were optimally determined by a grid search using cross-validation.22

Evaluation criteria

In the classification algorithms, the primary or original dataset was divided into training and test datasets. The model was constructed using the training dataset, while the test dataset was used to validate and calculate the model accuracy. The predictive performance of different predictive models in terms of accuracy was tested by 10 rounds of 10-fold cross-validation. Accuracy was defined as follows: Accuracy=(True positive+True negative)/(True positive+False positive+True negative+False negative)=(Correctly classified instances)/(Total instances) Some studies reported misclassification rates obtained by applying their classifier to a single splitting of the test and training set. The evaluation of classifiers based on a single test set may appear very impressive depending on the data splitting process. However, in the present study, to avoid the overfitting problem, we performed 10 rounds of 10-fold cross-validation to evaluate the classifiers. In this technique, the original data are randomly subdivided into 10 subgroups, and the prediction model is formed each time based on 9 subgroups and then evaluated and tested with the remaining subset. This process of training and testing is repeated and the overall accuracy of the prediction model is calculated as the average of these iterations.2122

Results

The samples consisted of 105 (55.3%) women and 85 (44.7%) men. The mean age of women (34.5±15.3 years) did not significantly differ from that of men (34.9±13.1 years). The results of the comparison of landmarks between the sexes are presented in Table 1. According to the t-test results, there was a statistically significant difference between the sexes in all variables except MF and MMCA. Furthermore, all measures except MW had higher values in men than in women.
Table 1

Comparison of variables in both sexes (mm)

PM: the distance between the porion and the right mastoidale, ML: right mastoid length, MI: the distance between the mastoidale and the right mastoid incision, MH: right mastoid height, MW: right mastoid width, MF: the distance between the lower point of the mastoid triangle and the most prominent convex surface of the right mastoid, IMD: intermastoid distance, IMSLD: the distance between the most prominent convex surface points of the right and left mastoid triangles, MMCA: the angle between two lines drawn from the most prominent right and left mastoid points

The results of the different classification models for predicting sex based on the average accuracy of the 100 iterations are presented in Table 2. All models, except for the radial kernel neural network model, had a prediction accuracy of over 90.0% for both the test and training sets. The random forest classification model, with an accuracy of 97.0%, showed the best performance in predicting sex.
Table 2

Comparison of the performance accuracy of different classification models for sex determination

To identify the most important variables in the random forest prediction model, the mean decrease in accuracy and the Gini index were used. The results are presented in Figure 5. Based on the results of both indices, IMSLD and IMD made the largest contributions to predicting sex, while the MMCA variable had the least significant role.
Fig. 5

Importance of the investigated variables for sex determination in term of mean decreased accuracy and Gini indices. IMSLD: intermastoid lateral surface distance, IMD: intermastoid distance, PM: distance between the porion and the right mastoidale, MH: right mastoid height, ML: distance between the mastoidale and the right mastoid incision, MW: right mastoid width, MI: distance between the mastoidale and the right mastoid incision, MF: distance between the lower point of mastoid triangle and the most prominent convex surface of right mastoid, MMCA: angle between 2 lines drawn from the most prominent right and left mastoid points.

Discussion

In the current study, the performance of different data mining methods in predicting sex based on the bone dimensions of the mastoid process was investigated. The results showed that the random forest classification model, with an accuracy of 97.0%, had the best performance in predicting sex. Moreover, in this model, the IMSLD and IMD variables made the most significant contributions to predicting sex. Given the extensive use and superior performance of data mining models in various domains, this study evaluated the performance of these models in predicting sex in forensic settings. In principle, there is no biological or mathematical reason why one particular classification method should be better than others for the prediction of the outcome in a certain field. Generally, finding the best method for the classification of different datasets is challenging, and identifying the optimal classifier for a given data set requires extensive investigation. Most studies on sex determination in the forensic area have used classical methods such as linear discrimination analysis and logistic regression. Therefore, it was not possible to directly compare the results of this study with those of other studies. Nonetheless, the findings of the present study can be interpreted in light of previous research on the use of the mastoid process for sex determination.14151617181920 Ibrahim et al.16 studied 388 computed tomography scans (231 men and 157 women) to present a new equation for sex determination using the mastoid triangle in the Malaysian population in 2018. The parameters of the study included the 3 sides of the rightmost triangle, its circumference, and its area on both sides. A comparison of mean values showed no significant difference between the right and left sides in either sex. Significant differences were found between males and females for the mean values of all measured parameters. The sum of sizes of the mastoid triangle was the best parameter for sex determination. The discrimination model presented a 84.4% classification accuracy. Bhayya et al.13 studied the prediction of sex using the mastoid process in 2018. In their study, 50 adult skulls were examined. The parameters studied were length, width, anterior-posterior thickness, size, and area of the mastoid. Measurements were made using a sliding Vernier caliper. Discriminant function analysis was performed for all variables that correctly classified 82% of the samples. Mastoid length and size had a significant effect on sex determination. However, in the present study, measurements were made on 3-dimensional images generated from CBCT data, using the linear and angular measurement tools in the On-Demand software, which led to greater accuracy in measurements. Amin et al.11 investigated the size, surface area, and internal convergence angle of the mastoid process in a sample of 192 3-dimensional skull images for sex determination in 2015. These images were obtained using DICOM images of CBCT scans. The mastoid component correctly classified sex in 90.6% of the samples, and the mastoid interval was the best parameter for determining sex. Consistent with the present study, Amin et al. found that mastoid width was higher in women than in men. Sujaritham et al.30 performed a study in 2011 to develop a discriminant function to identify sex in the Thai population-based on mastoid height and length. In their study, 150 skulls were divided into 2 groups. The first group consisted of 100 skulls (50 from women and 50 from men). The second group, including 25 men and women, was used to evaluate the obtained discriminant function. In that study, the following 4 parameters were assessed: left mastoid width, right mastoid width, left mastoid height, and right mastoid height. The mean mastoid size on both the right and left sides was significantly higher in men than in women. The accuracy of the discrimination function was 82% and 78% for the training and test sets, respectively. Consistent with the present study, their study found a significant difference in mastoid length and width between men and women. The discrepancies in the accuracy of the discriminant model obtained in different studies underscore the existence of variations in the skulls of individuals in various populations that are affected by the environment and nutrition. Furthermore, the position of skull landmarks varies slightly between different populations. Similarly, discrepancies in the accuracy obtained using various discriminant models can be due to differences in study design and research methods. It should be noted that any prediction model presented based on the dimensions of the mastoid triangle is highly dependent on the population under study, since environmental, genetic, nutritional, and immigration-related factors affect the shape and size of the mastoid bone in various populations. For this reason, the results of a study conducted in a specific population cannot be generalized to other populations. Therefore, to use the results of this model, it is recommended to consider these limitations, and if possible, comparable studies should be performed on a broader database. In this study, the random forest model showed superior predictive performance compared to the other prediction algorithms for sex determination based on mastoid parameters. These results demonstrate the possibility of developing an accurate tool using data mining algorithms for sex determination in the forensic framework. However, to find the best model for sex determination, additional studies with larger data sets from different geographical areas will lead to more reliable results.
  15 in total

1.  Heel-ball (HB) index: sexual dimorphism of a new index from foot dimensions.

Authors:  Kewal Krishan; Tanuj Kanchan; Neelam Passi; John A DiMaggio
Journal:  J Forensic Sci       Date:  2011-11-10       Impact factor: 1.832

2.  Determination of sex from juvenile crania by means of discriminant function analysis.

Authors:  Richard A Gonzalez
Journal:  J Forensic Sci       Date:  2011-09-21       Impact factor: 1.832

3.  Investigation on the utility of permanent maxillary molar cusp areas for sex estimation.

Authors:  P James Macaluso
Journal:  Forensic Sci Med Pathol       Date:  2010-11-16       Impact factor: 2.007

4.  Metric assessment of the "mastoid triangle" for sex determination: a validation study.

Authors:  Ariane Kemkes; Tanja Göbel
Journal:  J Forensic Sci       Date:  2006-09       Impact factor: 1.832

5.  Prediction of hearing loss among the noise-exposed workers in a steel factory using artificial intelligence approach.

Authors:  Mohsen Aliabadi; Maryam Farhadian; Ebrahim Darvishi
Journal:  Int Arch Occup Environ Health       Date:  2014-11-29       Impact factor: 3.015

6.  Estimation of sex from mastoid triangle - a craniometric analysis.

Authors:  Tanuj Kanchan; Anadi Gupta; Kewal Krishan
Journal:  J Forensic Leg Med       Date:  2013-08-09       Impact factor: 1.614

7.  Sex estimation from the mastoid process among North Indians.

Authors:  Vineeta Saini; Rashmi Srivastava; Rajesh K Rai; Satya N Shamal; Tej B Singh; Sunil K Tripathi
Journal:  J Forensic Sci       Date:  2011-11-21       Impact factor: 1.832

8.  Predictive accuracy of sexing the mandible by ramus flexure.

Authors:  Y Balci; M F Yavuz; S Cağdir
Journal:  Homo       Date:  2005

9.  Machine learning in medicine: a practical introduction.

Authors:  Jenni A M Sidey-Gibbons; Chris J Sidey-Gibbons
Journal:  BMC Med Res Methodol       Date:  2019-03-19       Impact factor: 4.615

Review 10.  Forensic odontology in DVI: current practice and recent advances.

Authors:  Alex Forrest
Journal:  Forensic Sci Res       Date:  2019-11-06
View more
  3 in total

Review 1.  Cone-Beam Computed Tomography: A New Tool on the Horizon for Forensic Dentistry.

Authors:  Rakhi Issrani; Namdeo Prabhu; Mohammed Ghazi Sghaireen; Kiran Kumar Ganji; Ali Mosfer A Alqahtani; Tamer Saleh ALJamaan; Amal Mohammed Alanazi; Sarah Hatab Alanazi; Mohammad Khursheed Alam; Manay Srinivas Munisekhar
Journal:  Int J Environ Res Public Health       Date:  2022-04-28       Impact factor: 4.614

2.  Sex determination from lateral cephalometric radiographs using an automated deep learning convolutional neural network.

Authors:  Maryam Khazaei; Vahid Mollabashi; Hassan Khotanlou; Maryam Farhadian
Journal:  Imaging Sci Dent       Date:  2022-07-05

3.  Radiographic Evaluation of Mastoid Parameters for Sexual Differentiation in North Indian Population.

Authors:  Jigyasa Passey; Suniti Pandey; Nishtha Passey; Rahul Singh; Raveena Singh; Arvind Kumar
Journal:  Cureus       Date:  2021-06-29
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.