| Literature DB >> 35095329 |
Maad M Mijwil1, Karan Aggarwal2.
Abstract
Appendicitis is a common disease that occurs particularly often in childhood and adolescence. The accurate diagnosis of acute appendicitis is the most significant precaution to avoid severe unnecessary surgery. In this paper, the author presents a machine learning (ML) technique to predict appendix illness whether it is acute or subacute, especially between 10 and 30 years and whether it requires an operation or just taking medication for treatment. The dataset has been collected from public hospital-based citizens between 2016 and 2019. The predictive results of the models achieved by different ML techniques (Logistic Regression, Naïve Bayes, Generalized Linear, Decision Tree, Support Vector Machine, Gradient Boosted Tree, Random Forest) are compared. The covered dataset are 625 specimens and the total of the medical records that are applied in this paper include 371 males (60.22%) and 254 females (40.12%). According to the dataset, the records consist of 318 (50.88%) operated and 307 (49.12%) unoperated patients. It is observed that the random forest algorithm obtains the optimal result with an accurately predicted result of 83.75%, precision of 84.11%, sensitivity of 81.08%, and the specificity of 81.01%. Moreover, an estimation method based on ML techniques is improved and enhanced to detect individuals with acute appendicitis.Entities:
Keywords: Acute appendicitis; Appendicitis surgery; Data mining; Machine learning; Specimens
Year: 2022 PMID: 35095329 PMCID: PMC8785023 DOI: 10.1007/s11042-022-11939-8
Source DB: PubMed Journal: Multimed Tools Appl ISSN: 1380-7501 Impact factor: 2.577
Fig. 1(a) Normal appendix, (b) Appendix inflammation
Fig. 2Data mining phases
Fig. 3CRISP-DM stages
The difference between predictive and descriptive model
| Predictive Model | Descriptive Model |
|---|---|
| Update dependent on the outcomes of known data. | Update dependent on the outcomes of known data that proposes to evaluate the outcomes of obscure data. |
| The purpose is to make assumptions that will predict the future. | To determine the models in the open data that can be applied to manage outcomes. |
Dataset (Attributes) with their Summary and weighting
| Name | Description | Weighting |
|---|---|---|
| Collection | ||
Hemoglobin (HGB) | HGB is a component of red blood cells and its principal duty is to transfer O2 from the respiratory organs to the tissues of the body [ | 0 |
Neutrophil (NEU) | NEU is a type of white blood cell that has a defined function against external aggression. Due to this function, the patient is at risk of infection if the level of leukocytes is low [ | 1.01 |
Lymphocytes (LYM) | LYMs are a class of white blood cells that are responsible for the immune response of the organism [ | 0.120 |
Mean corpuscular volume (MCV) | MCV blood test measures the average size of red blood cells. These cells carry O2 from the lungs to all cells in the body [ | 0.075 |
Mean platelet volume (MPV) | MPV is a measurement that describes the average size of platelets in the lifeblood [ | 0.051 |
Hematocrit (HTC) | HTC is a blood test that analyses the percentage of oxygen-containing cells, i.e., red blood cells, concerning the total blood volume [ | 0.119 |
| Deep vein thrombosis (DVT) | Thrombosis is a blood clot in the circulatory system. It attaches to the site at which it formed and lives there, hindering blood flow [ | 0.125 |
Platelets (PLT) | Platelets are tiny pieces of cytoplasm that are detached from the cytoplasm of mature megakaryocytes in the bone marrow [ | 0.226 |
C-Reactive protein (CRP) | CRP is a protein formed by the liver. It is transmitted into the bloodstream in response to inflammation [ | 0.432 |
White Blood Cell (WBC) | WBC are the cells responsible for defending the body against infection and helping to eliminate waste from the tissues [ The process of diagnosing infections in the blood can be done by recognizing the abnormalities in WBC [ | 0.895 |
Fig. 4Appendicitis Surgery (These images are downloaded from google image& are free for modify, use and can be shared)
Fig. 5Blood cells images (downloaded from google images)
Blood test result
| Collection | Normal Value | Need Surgery | Not Need Surgery | |
|---|---|---|---|---|
| HGB | 14–16 | 15.19 ± 0.046 | 15.23±0.048 | 0.573 |
| NEU | 2.5–7.5 | 4.69±0.14 | 5.16±0.11 | |
| LYM | 2.2–3.5 | 2.91±0.22 | 2.81±0.22 | |
| MCV | 90–95.8 | 93.14±0.10 | 93.36±0.01 | 0.096 |
| MPV | 10.4–11.5 | 10.96±0.01 | 10.94±0.01 | 0.304 |
| HTC | 0.43–0.47 | 0.45±0.001 | 0.46±0.0008 | |
| DVT | 0.8–2.7 | 10.77±0.02 | 10.79±0.02 | 0.315 |
| PLT | 160–450 | 379.79±5.73 | 402.18±4.46 | |
| CRP | 3.3–8.4 | 6.60±0.10 | 6.60±0.09 | 0.964 |
| WBC | 52,342–9510 | 78,010.17±755.85 | 86,906.08±707.13 |
*Statistically significant (p-values <0.05). Bold indicates a significant p value ≤0.05
Fig. 6The number of males and females who need surgery as well as those who do not need
Confusion matrix for diagnostic testing appendicitis
| Predicate Class | Positive | Negative |
|---|---|---|
| Positive | TP | FN |
| Negative | FP | TN |
Performance of the techniques with threshold of 0.1460
| Techniques | Accuracy | Precision | Specificity | Sensitivity |
|---|---|---|---|---|
| Random Forest | 83.75% | 84.11% | 81.01% | 81.08% |
| Logistic Regression | 74.12% | 71.00% | 71.20% | 72.28% |
| Naïve Bayes | 76.33% | 76.30% | 61.50% | 81.11% |
| Generalized Linear | 64.74% | 65.01% | 68.89% | 61.00% |
| Decision Tree | 66.92% | 61.12% | 70.00% | 64.20% |
| Support Vector Machine | 79.71% | 77.45% | 88.12% | 79.71% |
| Gradient Boosted Tree | 80.63% | 80.54% | 63.46% | 96.12% |
Results of Random Forest Analysis (Confusion matrix)
| Predicate Class | Positive | Negative |
|---|---|---|
| Positive | 46 | 10 |
| Negative | 14 | 49 |
Fig. 7Accuracy percentages of applied ML techniques
comparison between current study and previous studies
| Works | Records | Ages | Random Forest-accuracy | |
|---|---|---|---|---|
| Ref. [ | 595 | 10–30 | 92.96% | |
| Ref. [ | 430 | 0–18 | 94% | |
Execution time for each technique
| Techniques | Performance | Execution time |
|---|---|---|
| Random Forest | Optimal | 0.044820 |
| Logistic Regression | Good | 0.034525 |
| Naïve Bayes | Good | 0.024925 |
| Generalized Linear | Inadequate | 0.069225 |
| Decision Tree | Worst | 0.069225 |
| Support Vector Machine | Very good | 0.054250 |
| Gradient Boosted Tree | Best | 0.016425 |