| Literature DB >> 36010907 |
Rajasekhar Chaganti1, Furqan Rustam2, Isabel De La Torre Díez3, Juan Luis Vidal Mazón4,5,6, Carmen Lili Rodríguez4,7, Imran Ashraf8.
Abstract
Thyroid disease prediction has emerged as an important task recently. Despite existing approaches for its diagnosis, often the target is binary classification, the used datasets are small-sized and results are not validated either. Predominantly, existing approaches focus on model optimization and the feature engineering part is less investigated. To overcome these limitations, this study presents an approach that investigates feature engineering for machine learning and deep learning models. Forward feature selection, backward feature elimination, bidirectional feature elimination, and machine learning-based feature selection using extra tree classifiers are adopted. The proposed approach can predict Hashimoto's thyroiditis (primary hypothyroid), binding protein (increased binding protein), autoimmune thyroiditis (compensated hypothyroid), and non-thyroidal syndrome (NTIS) (concurrent non-thyroidal illness). Extensive experiments show that the extra tree classifier-based selected feature yields the best results with 0.99 accuracy and an F1 score when used with the random forest classifier. Results suggest that the machine learning models are a better choice for thyroid disease detection regarding the provided accuracy and the computational complexity. K-fold cross-validation and performance comparison with existing studies corroborate the superior performance of the proposed approach.Entities:
Keywords: bidirectional feature elimination; forward feature selection; machine learning; thyroid prediction
Year: 2022 PMID: 36010907 PMCID: PMC9405591 DOI: 10.3390/cancers14163914
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.575
Summary of the systematic analysis of the state-of-the-art thyroid disease studies.
| Authors | Year | Sample Size | Dataset Source | Model | Classes | Evaluation Metrics | Results |
|---|---|---|---|---|---|---|---|
| [ | 2020 | - | ToxCast | LR RF SVM XGB ANN | 2 | F1-score | (TPO) XGB-83% and (TR) RF-81% |
| [ | 2018 | 7200 samples, 21 attributes | UCI | SVM, Multiple Linear Regression(MLR), NB and DT | 2 | Accuracy | MLR 91.59% SVM 96.04% Naive Bayes 6.31% Decision Trees 99.23% |
| [ | 2020 | 7547, 30 features | UCI | multi-kernel SVM | 3 | Accuracy, Sensitivity, and Specificity | Accuracy (97.49%), Sensitivity (99.05%), and Specificity (94.5%) |
| [ | 2021 | 3771 samples, 30 attributes | UCI | DT, KNN, RF, and SVM | 4 | Accuracy | KNN 98.3% SVM 96.1% DT 99.5% RF 99.81% |
| [ | 2021 | 519 samples | diagnostic center Dhaka, Bangladesh | SVM, DT, RF, LR, and NB. Recursive Feature Selection (RFE), Univariate Feature Selection (UFS) and PCA | 4 | Accuracy | RFE, SVM, DT, RF, LR accuracy—99.35% |
| [ | 2021 | 1250 with 17 attributes | external hospitals and laboratories | SVM, RF, DT, NB, LR, KNN, MLP, linear discriminant analysis (LDA) and DT | 3 | Accuracy | DT 90.13, SVM 92.53 RF 91.2 NB 90.67 LR 91.73 LDA 83.2 KNN 91.47 MLP 96.4 |
| [ | 2021 | 7200 patients, with 21 features | UCI | multiple MLP | 3 | Accuracy | multiple MLP 99% |
| [ | 2021 | 690 samples, 13 features | datasets from KEEL repo and District Headquarters teaching hospital, Pakistan | KNN without feature selection, KNN using L1-based feature selection, and KNN using chi-square-based feature selection | 3 | Accuracy | KNN 98% |
| [ | 2021 | 3772 and 30 attributes | UCI | RF, sequential minimal optimization (SMO), DT, and K-star classifier | 2 | Accuracy | K = 6, RF 99.44%, DT 98.97%, K-star 94.67%, and SMO 93.67% |
| [ | 2022 | 3163 | UCI | DT, RF, KNN, and ANN | 2 | Accuracy | Best performance Accuracy RF 94.8% |
| [ | 2022 | 215 with 5 features | UCI | KNN, XGB, LR, DT | 3 | Accuracy | KNN 81.25 XGBoost 87.5 LR 96.875 DT 98.59 |
| [ | 2022 | 3152, 23 features | UCI | DNN | 2 | Accuracy | Accuracy 99.95% |
Figure 1Flow of the proposed methodology.
Dataset description.
| Features | Sample Count |
|---|---|
| 31 | 9172 |
Data sample attribute Types.
| Attribute | Description | Data Type |
|---|---|---|
| age | age of the patient | (int) |
| sex | sex patient identifies | (str) |
| on_thyroxine | whether patient is on thyroxine | (bool) |
| query on thyroxine | whether patient is on thyroxine | (bool) |
| on antithyroid meds | whether the patient is on antithyroid meds | (bool) |
| sick | whether patient is sick | (bool) |
| pregnant | whether patient is pregnant | (bool) |
| thyroid_surgery | whether patient has undergone thyroid surgery | (bool) |
| I131_treatment | whether patient is undergoing I131 treatment | (bool) |
| query_hypothyroid | whether the patient believes they have hypothyroid | (bool) |
| query_hyperthyroid | whether the patient believes they have hyperthyroid | (bool) |
| lithium | whether patient * lithium | (bool) |
| goitre | whether patient has goitre | (bool) |
| tumor | whether patient has tumor | (bool) |
| hypopituitary | whether patient * hyperpituitary gland | (float) |
| psych | whether patient * psych | (bool) |
| TSH_measured | whether TSH was measured in the blood | (bool) |
| TSH | TSH level in blood from lab work | (float) |
| T3_measured | whether T3 was measured in the blood | (bool) |
| T3 | T3 level in blood from lab work | (float) |
| TT4_measured | whether TT4 was measured in the blood | (bool) |
| TT4 | TT4 level in blood from lab work | (float) |
| T4U_measured | whether T4U was measured in the blood | (bool) |
| T4U | T4U level in blood from lab work | (float) |
| FTI_measured | whether FTI was measured in the blood | (bool) |
| FTI | FTI level in blood from lab work | (float) |
| TBG_measured | whether TBG was measured in the blood | (bool) |
| TBG | TBG level in blood from lab work | (float) |
| referral_source | (str) | |
| target | hyperthyroidism medical diagnosis | (str) |
| patient_id | unique id of the patient | (str) |
Description of the class-wise target.
| Condition | Diagnosis Class | Count |
|---|---|---|
| hyperthyroid | hyperthyroid (A) | 147 |
| T3 toxic (B) | 21 | |
| toxic goiter (C) | 6 | |
| secondary toxic (D) | 8 | |
| hypothyroid | hypothyroid (E) | 1 |
| primary hypothyroid (F) | 233 | |
| compensated hypothyroid (G) | 359 | |
| secondary hypothyroid (H) | 8 | |
| binding protein: | increased binding protein (I) | 346 |
| decreased binding protein (J) | 30 | |
| general health | concurrent non-thyroidal illness (K) | 436 |
| replacement therapy: | underreplaced (M) | 111 |
| consistent with replacement therapy (L) | 115 | |
| overreplaced (N) | 110 | |
| antithyroid treatment: | antithyroid drugs (O) | 14 |
| I131 treatment (P) | 5 | |
| surgery (Q) | 14 | |
| miscellaneous: | discordant assay results (R) | 196 |
| elevated TBG (S) | 85 | |
| elevated thyroid hormones (T) | 0 | |
| no condition | (-) | 6771 |
Balanced dataset for Thyroid disease classification.
| Class | Prepossessed Count | Final Count |
|---|---|---|
| Normal | 6771 | 400 |
| primary hypothyroid | 233 | 233 |
| increased binding protein | 346 | 346 |
| compensated hypothyroid | 359 | 359 |
| concurrent non-thyroidal illness | 436 | 436 |
Sample of dataset.
| age | sex | on_thyroxine | query_on_thyroxine | on_antithyroid_meds | sick | pregnant | thyroid_surgery |
|---|---|---|---|---|---|---|---|
| 29 | F | f | f | f | f | f | f |
| 71 | F | t | f | f | f | f | f |
| 61 | M | f | f | f | t | f | f |
| 88 | F | f | f | f | f | f | f |
|
|
|
|
|
|
|
|
|
| f | t | f | f | f | f | f | f |
| f | f | f | f | f | f | f | f |
| f | f | f | f | f | f | f | f |
| f | f | f | f | f | f | f | f |
|
|
|
|
|
|
|
|
|
| t | 0.3 | f | f | f | |||
| t | 0.05 | f | t | 126 | t | 1.38 | |
| t | 9.799999 | t | 1.2 | t | 114 | t | 0.84 |
| t | 0.2 | t | 0.4 | t | 98 | t | 0.73 |
|
|
|
|
|
|
|
| |
| f | f | other | - |
| |||
| t | 91 | f | other | I |
| ||
| t | 136 | f | other | G |
| ||
| t | 134 | f | other | K |
|
Figure 2Feature impact on models performance.
Figure 3Feature Importance using MLFS.
Target class count for training and testing sets.
| Class | Hyper-Parameters | Tuning Range |
|---|---|---|
| LR | solver = liblinear, C = 5.0 | solver = {liblinear, saga, sag}, C = {1.0 to 8.0} |
| SVM | kernel = ‘linear’, C = 5.0 | kernel = {‘linear’, ‘poly’, ‘sigmoid’} C = {1.0 to 8.0} |
| RF | n_estimators = 200, max_depth = 20 | n_estimators = {10 to 300}, max_depth = {2 to 50} |
| GBM | n_estimators = 200, max_depth = 20, learning_rat = 0.5 | n_estimators = {10 to 300}, max_depth = {2 to 50}, learning_rat = {0.1 to 0.9} |
| ADA | n_estimators = 200, max_depth = 20, learning_rat = 0.5 | n_estimators = {10 to 300}, max_depth = {2 to 50}, learning_rat = {0.1 to 0.9} |
Number of samples for training and test subset.
| Target Class | Training | Testing | Total |
|---|---|---|---|
| “_” (0) | 325 | 75 | 400 |
| F (1) | 190 | 43 | 233 |
| G (2) | 280 | 79 | 359 |
| I (3) | 271 | 75 | 346 |
| K (4) | 353 | 83 | 436 |
Results of machine learning models using original feature set.
| Model | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|
| RF | 0.98 | 0.98 | 0.98 | 0.98 |
| GBM | 0.97 | 0.98 | 0.98 | 0.98 |
| ADA | 0.97 | 0.97 | 0.97 | 0.97 |
| LR | 0.85 | 0.85 | 0.85 | 0.85 |
| SVM | 0.85 | 0.85 | 0.85 | 0.85 |
Performance of machine learning models using FFS feature set.
| Model | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|
| RF | 0.97 | 0.97 | 0.96 | 0.96 |
| GBM | 0.97 | 0.97 | 0.96 | 0.96 |
| ADA | 0.93 | 0.92 | 0.92 | 0.92 |
| LR | 0.83 | 0.83 | 0.82 | 0.82 |
| SVM | 0.92 | 0.92 | 0.92 | 0.92 |
Figure 4Feature space using different feature selection methods.
Results using BFE feature set with machine learning models.
| Model | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|
| RF | 0.96 | 0.96 | 0.95 | 0.95 |
| GBM | 0.92 | 0.92 | 0.91 | 0.91 |
| ADA | 0.83 | 0.84 | 0.83 | 0.83 |
| LR | 0.83 | 0.83 | 0.82 | 0.82 |
| SVM | 0.92 | 0.92 | 0.92 | 0.92 |
Figure 5Feature Space using Different Feature Selection Methods. (a) ML. (b) Forward Feature Selection (FFS). (c) Backward Feature Elimination (BFE). (d) Bi-Directional Feature Elimination (BiDFE). (e) Original.
Performance of models using BiDFE feature set.
| Model | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|
| RF | 0.98 | 0.98 | 0.98 | 0.98 |
| GBM | 0.96 | 0.96 | 0.96 | 0.96 |
| ADA | 0.84 | 0.87 | 0.85 | 0.84 |
| LR | 0.81 | 0.83 | 0.81 | 0.81 |
| SVM | 0.92 | 0.92 | 0.92 | 0.92 |
Performance of models using MLFS feature set.
| Model | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|
| RF | 0.99 | 0.99 | 0.99 | 0.99 |
| GBM | 0.98 | 0.98 | 0.98 | 0.98 |
| ADA | 0.97 | 0.97 | 0.97 | 0.97 |
| LR | 0.87 | 0.88 | 0.87 | 0.87 |
| SVM | 0.92 | 0.92 | 0.92 | 0.92 |
Results of 10-fold cross-validation.
| Feature | Model | Accuracy | SD | Time |
|---|---|---|---|---|
| Original | RF | 0.94 | +/−0.10 | 1.689 |
| GBM | 0.93 | +/−0.13 | 3.831 | |
| ADA | 0.93 | +/−0.08 | 1.758 | |
| LR | 0.84 | +/−0.13 | 0.330 | |
| SVM | 0.88 | +/−0.12 | 243.126 | |
| FS | RF | 0.93 | +/−0.10 | 0.440 |
| GBM | 0.90 | +/−0.14 | 1.349 | |
| ADA | 0.89 | +/−0.08 | 0.743 | |
| LR | 0.78 | +/−0.13 | 0.330 | |
| SVM | 0.90 | +/−0.15 | 210.65 | |
| BE | RF | 0.93 | +/−0.11 | 0.601 |
| GBM | 0.90 | +/−0.14 | 1.380 | |
| ADA | 0.87 | +/−0.07 | 0.635 | |
| LR | 0.78 | +/−0.13 | 0.111 | |
| SVM | 0.90 | +/−0.15 | 173.80 | |
| BiDFE | RF | 0.93 | +/−0.03 | 0.677 |
| GBM | 0.90 | +/−0.02 | 8.733 | |
| ADA | 0.89 | +/−0.06 | 0.617 | |
| LR | 0.78 | +/−0.06 | 0.111 | |
| SVM | 0.90 | +/−0.04 | 42.496 | |
| ML FS | RF | 0.94 | +/−0.01 | 1.689 |
| GBM | 0.93 | +/−0.13 | 3.831 | |
| ADA | 0.93 | +/−0.08 | 1.758 | |
| LR | 0.84 | +/−0.13 | 0.330 | |
| SVM | 0.91 | +/−0.13 | 365.51 |
Architecture of deep learning models.
| Model | Hyperparameters |
|---|---|
| LSTM | Embedding (4000, 100, input_length = …) |
| CNN | Embedding (4000, 100, input_length = …) |
| CNN-LSTM | Embedding (4000, 100, input_length = …) |
| loss = ‘categorical_crossentropy’, optimizer = ‘adam’, | |
Figure 6Deep learning models per epochs evaluation scores using original features and MLFS. (a) CNN Accuracy using Original Features, (b) CNN Loss using Original Features, (c) CNN-LSTM Accuracy using Original Features, (d) CNN-LSTM Loss using Original Features, (e) LSTM Accuracy using Original Features, (f) CNN Loss using Original Features, (g) CNN Accuracy using MLFS, (h) CNN Loss using MLFS, (i) CNN-LSTM Accuracy using MLFS, (j) CNN-LSTM Loss using MLFS, (k) LSTM Accuracy using MLFS, and (l) LSTM Loss using MLFS.
Figure 7Deep learning models per epochs evaluation scores using BiDFE and BFE. (a) CNN accuracy using BFE, (b) CNN loss using BFE, (c) CNN-LSTM accuracy using BFE, (d) CNN-LSTM loss using BFE, (e) LSTM accuracy using BFE, (f) LSTM loss using BFE, (g) CNN accuracy using BiDFE, (h) CNN loss using BiDFE, (i) CNN-LSTM accuracy using BiDFE, (j) CNN-LSTM loss using BiDFE, (k) LSTM accuracy using BiDFE and (l) LSTM loss using BiDFE.
Figure 8Deep learning models per epochs evaluation scores using FFS. (a) CNN accuracy using FFS, (b) CNN loss using FFS, (c) CNN-LSTM accuracy using FFS, (d) CNN-LSTM loss using FFS, (e) LSTM accuracy using FFS and (f) LSTM loss using FFS.
Deep learning models results with each feature selection technique.
| Feature | Model | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|---|
| Original | LSTM | 0.84 | 0.84 | 0.83 | 0.83 |
| CNN | 0.93 | 0.94 | 0.92 | 0.93 | |
| CNN-LSTM | 0.90 | 0.90 | 0.88 | 0.88 | |
| FS | LSTM | 0.62 | 0.63 | 0.59 | 0.59 |
| CNN | 0.86 | 0.87 | 0.84 | 0.85 | |
| CNN-LSTM | 0.77 | 0.78 | 0.73 | 0.74 | |
| BE | LSTM | 0.57 | 0.61 | 0.54 | 0.54 |
| CNN | 0.86 | 0.87 | 0.84 | 0.84 | |
| CNN-LSTM | 0.86 | 0.87 | 0.84 | 0.85 | |
| BiDFE | LSTM | 0.83 | 0.83 | 0.80 | 0.80 |
| CNN | 0.85 | 0.84 | 0.81 | 0.82 | |
| CNN-LSTM | 0.87 | 0.88 | 0.84 | 0.86 | |
| ML FS | LSTM | 0.57 | 0.63 | 0.54 | 0.55 |
| CNN | 0.89 | 0.89 | 0.87 | 0.88 | |
| CNN-LSTM | 0.92 | 0.91 | 0.91 | 0.91 |
Deep learning models computational time.
| Model | FFS | BFE | BiDFE | MLFS | Original |
|---|---|---|---|---|---|
| LSTM | 44.975 | 87.842 | 98.067 | 66.361 | 170.28 |
| CNN | 83.088 | 37.796 | 131.48 | 30.852 | 56.436 |
| CNN-LSTM | 150.53 | 65.992 | 214.96 | 47.922 | 97.662 |
Comparison with other studies.
| Ref. | Year | Model | Accuracy | F1 Score |
|---|---|---|---|---|
| [ | 2022 | RF | 0.98 | 0.98 |
| [ | 2022 | DT | 0.98 | 0.97 |
| [ | 2022 | DNN | 0.93 | 0.93 |
| [ | 2022 | ConvSGLV | 0.96 | 0.96 |
| This study | 2022 | MLFS+RF | 0.99 | 0.99 |