Md Maniruzzaman1, Md Jahanur Rahman2, Benojir Ahammed3, Md Menhazul Abedin3, Harman S Suri4, Mainak Biswas5, Ayman El-Baz6, Petros Bangeas7, Georgios Tsoulfas8, Jasjit S Suri9. 1. Statistics Discipline, Khulna University, Khulna, Bangladesh; Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh. 2. Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh. 3. Statistics Discipline, Khulna University, Khulna, Bangladesh. 4. Brown University, Providence, RI, USA. 5. Advanced Knowledge Engineering Centre, Global Biomedical Technologies, Inc., Roseville, CA, USA. 6. Department of Bioengineering, University of Louisville, Louisville, Kentucky, USA. 7. Department of Surgery, Papageorgiou Hospital, Aristotle University Thessaloniki, Greece. 8. Department of Surgery, Aristotle University of Thessaloniki, Thessaloniki, Greece. 9. Advanced Knowledge Engineering Centre, Global Biomedical Technologies, Inc., Roseville, CA, USA; AtheroPoint, Roseville, CA, USA. Electronic address: jasjit.suri@atheropoint.com.
Abstract
OBJECTIVE: A colon microarray data is a repository of thousands of gene expressions with different strengths for each cancer cell. It is necessary to detect which genes are responsible for cancer growth. This study presents an exhaustive comparative study of different machine learning (ML) systems which serves two major purposes: (a) identification of high risk differential genes using statistical tests and (b) development of a ML strategy for predicting cancer genes. METHODS: Four statistical tests namely: Wilcoxon sign rank sum (WCSRS), t test, Kruskal-Wallis (KW), and F-test were adapted for cancerous gene identification using their p-values. The extracted gene set was used to classify cancer patients using ten classifiers namely: linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), naïve Bayes (NB), Gaussian process classification (GPC), support vector machine (SVM), artificial neural network (ANN), logistic regression (LR), decision tree (DT), Adaboost (AB), and random forest (RF). Performance was then evaluated using cross-validation protocols and standardized metrics viz. accuracy (ACC) and area under the curve (AUC). RESULTS: The colon cancer dataset consists of 2000 genes from 62 patients (40 cancer vs. 22 control). The overall mean ACC of our ML system using all four statistical tests and all ten classifiers was 90.50%. The ML system showed an ACC of 99.81% using a combination WCSRS test and RF-based classifier. This is an improvement of 8% over previously published values in literature. CONCLUSIONS: RF-based model with statistical tests for detection of high risk genes showed the best performance for accurate cancer classification in multi-center clinical trials.
OBJECTIVE: A colon microarray data is a repository of thousands of gene expressions with different strengths for each cancer cell. It is necessary to detect which genes are responsible for cancer growth. This study presents an exhaustive comparative study of different machine learning (ML) systems which serves two major purposes: (a) identification of high risk differential genes using statistical tests and (b) development of a ML strategy for predicting cancer genes. METHODS: Four statistical tests namely: Wilcoxon sign rank sum (WCSRS), t test, Kruskal-Wallis (KW), and F-test were adapted for cancerous gene identification using their p-values. The extracted gene set was used to classify cancerpatients using ten classifiers namely: linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), naïve Bayes (NB), Gaussian process classification (GPC), support vector machine (SVM), artificial neural network (ANN), logistic regression (LR), decision tree (DT), Adaboost (AB), and random forest (RF). Performance was then evaluated using cross-validation protocols and standardized metrics viz. accuracy (ACC) and area under the curve (AUC). RESULTS: The colon cancer dataset consists of 2000 genes from 62 patients (40 cancer vs. 22 control). The overall mean ACC of our ML system using all four statistical tests and all ten classifiers was 90.50%. The ML system showed an ACC of 99.81% using a combination WCSRS test and RF-based classifier. This is an improvement of 8% over previously published values in literature. CONCLUSIONS: RF-based model with statistical tests for detection of high risk genes showed the best performance for accurate cancer classification in multi-center clinical trials.
Authors: Ankush D Jamthikar; Deep Gupta; Laura E Mantella; Luca Saba; John R Laird; Amer M Johri; Jasjit S Suri Journal: Int J Cardiovasc Imaging Date: 2020-11-12 Impact factor: 2.357
Authors: Ankush Jamthikar; Deep Gupta; Luca Saba; Narendra N Khanna; Tadashi Araki; Klaudija Viskovic; Sophie Mavrogeni; John R Laird; Gyan Pareek; Martin Miner; Petros P Sfikakis; Athanasios Protogerou; Vijay Viswanathan; Aditya Sharma; Andrew Nicolaides; George D Kitas; Jasjit S Suri Journal: Cardiovasc Diagn Ther Date: 2020-08
Authors: Ankush Jamthikar; Deep Gupta; Elisa Cuadrado-Godia; Anudeep Puvvula; Narendra N Khanna; Luca Saba; Klaudija Viskovic; Sophie Mavrogeni; Monika Turk; John R Laird; Gyan Pareek; Martin Miner; Petros P Sfikakis; Athanasios Protogerou; George D Kitas; Chithra Shankar; Andrew Nicolaides; Vijay Viswanathan; Aditya Sharma; Jasjit S Suri Journal: Cardiovasc Diagn Ther Date: 2020-08
Authors: George Konstantonis; Krishna V Singh; Petros P Sfikakis; Ankush D Jamthikar; George D Kitas; Suneet K Gupta; Luca Saba; Kleio Verrou; Narendra N Khanna; Zoltan Ruzsa; Aditya M Sharma; John R Laird; Amer M Johri; Manudeep Kalra; Athanasios Protogerou; Jasjit S Suri Journal: Rheumatol Int Date: 2022-01-11 Impact factor: 2.631
Authors: Ankush Jamthikar; Deep Gupta; Narendra N Khanna; Luca Saba; Tadashi Araki; Klaudija Viskovic; Harman S Suri; Ajay Gupta; Sophie Mavrogeni; Monika Turk; John R Laird; Gyan Pareek; Martin Miner; Petros P Sfikakis; Athanasios Protogerou; George D Kitas; Vijay Viswanathan; Andrew Nicolaides; Deepak L Bhatt; Jasjit S Suri Journal: Cardiovasc Diagn Ther Date: 2019-10
Authors: My Truong; Finn Lennartsson; Adnan Bibic; Lena Sundius; Ana Persson; Roger Siemund; René In't Zandt; Isabel Goncalves; Johan Wassélius Journal: Eur J Radiol Open Date: 2021-01-21