| Literature DB >> 35387251 |
Anurag Jain1, Ahmed Nadeem2, Huda Majdi Altoukhi3, Sajjad Shaukat Jamal4, Henry Kwame Atiglah5, Haitham Elwahsh6.
Abstract
A technology known as data analytics is a massively parallel processing approach that may be used to forecast a wide range of illnesses. Many scientific research methodologies have the problem of requiring a significant amount of time and processing effort, which has a negative impact on the overall performance of the system. Virtual screening (VS) is a drug discovery approach that makes use of big data techniques and is based on the concept of virtual screening. This approach is utilised for the development of novel drugs, and it is a time-consuming procedure that includes the docking of ligands in several databases in order to build the protein receptor. The proposed work is divided into two modules: image processing-based cancer segmentation and analysis using extracted features using big data analytics, and cancer segmentation and analysis using extracted features using image processing. This statistical approach is critical in the development of new drugs for the treatment of liver cancer. Machine learning methods were utilised in the prediction of liver cancer, including the MapReduce and Mahout algorithms, which were used to prefilter the set of ligand filaments before they were used in the prediction of liver cancer. This work proposes the SMRF algorithm, an improved scalable random forest algorithm built on the MapReduce foundation. Using a computer cluster or cloud computing environment, this new method categorises massive datasets. With SMRF, small amounts of data are processed and optimised over a large number of computers, allowing for the highest possible throughput. When compared to the standard random forest method, the testing findings reveal that the SMRF algorithm exhibits the same level of accuracy deterioration but exhibits superior overall performance. The accuracy range of 80 percent using the performance metrics analysis is included in the actual formulation of the medicine that is utilised for liver cancer prediction in this study.Entities:
Mesh:
Year: 2022 PMID: 35387251 PMCID: PMC8979737 DOI: 10.1155/2022/8154523
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Number of deaths due to liver cancer in developed and developing countries.
Figure 2Variation in the frequency against the binding affinity interval in Kcal/mol.
Figure 3Flowchart of the proposed work.
Figure 4Scalable random forest algorithm based on MapReduce.
Classification-based data analytics.
| Datasets | SMRF (%) | Traditional RF (%) |
|---|---|---|
| Liver | 95.23 | 96.55 |
| Cancer | 94.35 | 92.83 |
| DNA | 99.16 | 99.53 |
| Chess | 97.66 | 81.25 |
| Corral | 93.16 | 88.03 |
| Ionosphere | 92.00 | 92.00 |
| Iris | 95.00 | 87.70 |
| Letter | 90.80 | 85.35 |
| Satimage | 95.84 | 93.50 |
| Segment | 99.98 | 99.95 |
| Shuttle | 95.67 | 93.32 |
Figure 5Comparison of the proposed SMRF with various applications.
Figure 6Image outputs in preprocessing.
Figure 7CT scan.
Figure 8Segmented CT scan image.
Figure 9Classification of CT scan.
Performance metrics.
| ACCU | SENS | SPECIFI | FPR | PPV | NPV | |
|---|---|---|---|---|---|---|
| CT1 | 99.28919 | 100 | 99.27504 | 26.69522 | 73.30478 | 100 |
| CT2 | 99.42793 | 100 | 99.41669 | 22.8866 | 77.1134 | 100 |
| CT3 | 99.17108 | 100 | 99.15489 | 30.21232 | 69.78768 | 100 |
| CT4 | 99.13311 | 100 | 99.1153 | 30.09321 | 69.90679 | 100 |
| CT5 | 99.07229 | 100 | 99.05211 | 30.35376 | 69.64624 | 100 |
| CT7 | 99.26252 | 100 | 99.24729 | 26.70654 | 73.29346 | 100 |
| CT8 | 93.36848 | 100 | 99.35712 | 26.33125 | 73.66875 | 100 |
| CT9 | 91.3963 | 100 | 99.38447 | 23.89629 | 76.10371 | 100 |
| CT10 | 90.35506 | 100 | 99.34252 | 25.27013 | 74.72987 | 100 |
| CT11 | 93.37441 | 100 | 99.36253 | 25.13465 | 74.86535 | 100 |
| CT12 | 95.33381 | 100 | 99.32114 | 26.30273 | 73.69727 | 100 |
| CT13 | 93.34468 | 100 | 99.33146 | 24.88874 | 75.11126 | 100 |
| CT14 | 94.28268 | 100 | 99.269 | 27.70199 | 72.29801 | 100 |
| CT15 | 90.29426 | 100 | 99.28084 | 27.43989 | 72.56011 | 100 |
∗ ACCU: accuracy; SENS: sensitivity; SPECIFI: specificity; FPR: false-positive rate; PPV: positive prediction value; NPV: negative prediction value; ROC: receiver operating characteristic.
Comparison of the proposed work.
| Classifiers | Accuracy (%) | Precision (%) | F1 score (%) | ROC curve (%) |
|---|---|---|---|---|
| SVM [ | 98.11 | 99 | 98.3 | 99.62 |
| Naive Bayes [ | 98.11 | 98.1 | 99.3 | 97.24 |
| CNN [ | 98.11 | 97.9 | 99.5 | 97.07 |
| Proposed | 98.8 | 99 | 99.3 | 98.44 |