| Literature DB >> 35004588 |
Mavra Mehmood1, Muhammad Rizwan1, Michal Gregus Ml2, Sidra Abbas3.
Abstract
Cervical malignant growth is the fourth most typical reason for disease demise in women around the globe. Cervical cancer growth is related to human papillomavirus (HPV) contamination. Early screening made cervical cancer a preventable disease that results in minimizing the global burden of cervical cancer. In developing countries, women do not approach sufficient screening programs because of the costly procedures to undergo examination regularly, scarce awareness, and lack of access to the medical center. In this manner, the expectation of the individual patient's risk becomes very high. There are many risk factors relevant to malignant cervical formation. This paper proposes an approach named CervDetect that uses machine learning algorithms to evaluate the risk elements of malignant cervical formation. CervDetect uses Pearson correlation between input variables as well as with the output variable to pre-process the data. CervDetect uses the random forest (RF) feature selection technique to select significant features. Finally, CervDetect uses a hybrid approach by combining RF and shallow neural networks to detect Cervical Cancer. Results show that CervDetect accurately predicts cervical cancer, outperforms the state-of-the-art studies, and achieved an accuracy of 93.6%, mean squared error (MSE) error of 0.07111, false-positive rate (FPR) of 6.4%, and false-negative rate (FNR) of 100%.Entities:
Keywords: artificial intelligence; cervical cancer; classification; feature engineering; gynecological diseases; medical data
Mesh:
Year: 2021 PMID: 35004588 PMCID: PMC8733205 DOI: 10.3389/fpubh.2021.788376
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Dataset description.
|
|
|
|
|---|---|---|
| 1 | “Number of sexual partners” | 26 |
| 2 | “First sexual intercourse” | 7 |
| 3 | “Num of pregnancies” | 56 |
| 4 | “Smokes” | 13 |
| 5 | “Smokes (years)” | 13 |
| 6 | “Smokes (packs/year)” | 13 |
| 7 | “Hormonal Contraceptives” | 108 |
| 10 | “Hormonal Contraceptives (years)” | 108 |
| 11 | “IUD” | 117 |
| 12 | “IUD (years)” | 117 |
| 13 | “STDs” | 105 |
| 14 | “STDs (number)” | 105 |
| 15 | “STDs:condylomatosis” | 105 |
| 16 | “STDs:cervical condylomatosis” | 105 |
| 17 | “STDs:vaginal condylomatosis” | 105 |
| 18 | “STDs:syphilis” | 105 |
| 20 | “STDs:pelvic inflammatory disease” | 105 |
| 21 | “STDs:genital herpes” | 105 |
| 22 | “STDs:molluscum contagiosum” | 105 |
| 23 | “STDs:AIDS” | 105 |
| 24 | “STDs:HIV” | 105 |
| 25 | “STDs:Hepatitis B” | 105 |
| 26 | “STDs:HPV” | 105 |
| 27 | “STDs: Time since first diagnosis” | 787 |
| 28 | “STDs: Time since last diagnosis” | 787 |
| 29 | “Age” | 0 |
| 30 | “STDs: Number of diagnosis” | 0 |
| 31 | “Dx:Cancer” | 0 |
| 32 | “Dx:CIN” | 0 |
| 33 | “Dx:HPV” | 0 |
| 34 | “Dx” | 0 |
| 35 | “Hinselmann” | 0 |
| 36 | “Schiller” | 0 |
Figure 1Block diagram of proposed work flow.
CervDetect Algorithm.
| 1: Begin with the collection of random samples from a dataset. |
| 2: First, this algorithm must generate a decision tree for each sample. |
| 3: The prediction will come from the decision tree. |
| 4: Counting will be carried out in this stage for each final score. |
| 5: Ultimately, pick the most elected outcome of the prediction as to the outcome of the prediction. |
| 6: |
| 7: |
| 8: Initialization is the first step after the configuration of the neural network |
| 9: Initiate all weights |
| 10: Put all of the bias nodes B1 = B2 = 1.0. |
| 11: Feedforward |
| 12: |
| 13: each of x input have weights w21, w22 … |
| 14: y= |
| 15: v= sigmoid(y) |
| 16: |
| 17: |
| 18: e= |
| 19: |
| 20: Feed backward |
| 21: |
| 22: Partial derivative of the e to the weight adjusted |
| 23: weight update |
| 24: |
Figure 2Correlation between input variables after handling missing values.
Figure 3Graph plotted after applying random forest (RF) algorithm for feature importance.
Figure 4The architecture of shallow neural network.
Confusion matrix of CervDetect.
|
|
|
|
|---|---|---|
| Biopsy | TP | FN |
| Normal | FP | TN |
Selected optimal features.
|
|
|
|---|---|
| 1 | “Age” |
| 2 | “Hormonal Contraceptives (years)” |
| 3 | “First sexual intercourse” |
| 4 | “Num of pregnancies” |
| 5 | “IUD (years)” |
| 6 | “Smokes (years)” |
| 7 | “Smokes (packs/year)” |
| 8 | “Hormonal Contraceptives” |
| 9 | “STDs: Time since first diagnosis” |
| 10 | “STDs: Time since last diagnosis” |
Figure 5Shallow neural network.
Accuracy table.
|
|
|
|
|---|---|---|
| 1 | Accuracy | 93.6% |
| 2 | TPR | 100% |
| 3 | FPR | 100% |
Figure 6Confusion matrix of training, testing, and validation.
Figure 7Roc curve of testing, training, and validation.
Figure 8Cross entropy.
Figure 9Error histogram.
Figure 10Comparative analysis with existing works.
Figure 11Training state of neural network for cervical cancer diagnosis.