Saurabh Kumar Srivastava1, Sandeep Kumar Singh2, Jasjit S Suri3. 1. Dept. of CSE, ABES Engineering College, Ghaziabad, India. Electronic address: saurabh.srivastava@abes.ac.in. 2. Dept. of CSE, Jaypee Institute of Information Technology, Noida, India. Electronic address: sandeepk.singh@jiit.ac.in. 3. Advanced Knowledge Engineering Center, Global Biomedical Technologies, Inc., Roseville, CA, USA. Electronic address: jsuri@comcast.net.
Abstract
BACKGROUND AND OBJECTIVE: Healthcare tweets are particularly challenging due to its sparse layout and its limited character size. Compared to previous method based on "bag of words" (BOW) model, this study uniquely identifies the enrichment protocol and learns how semantically different aspects of feature selection such as BOW (feature F0), term frequency inverse document frequency (TF-IDF, feature F1), and latent semantic indexing (LSI, feature F2) when applied sequentially with classifier improves the overall performance. METHODS: To study this enrichment concept, our ML model is tested on two kinds of diverse data sets: (i) D1: Disease data with conjunctivitis, diarrhea, stomach ache, cough and nausea related tweets, and (ii) D2: WebKB4 dataset, while adapting three kind of classifiers (a) C1: support vector machine with radial basis function (SVMR), (b) C2: Multi-layer perceptron (MLP) and (c) C3: Random Forest (RF). Partition protocol (K10) was adapted with different performance metrics to evaluate machine learning (ML)-system. RESULTS: Using the combination of F1, C1, D1, K10, ML accuracy was: 94%, while with F2, C1, D1, K10, ML accuracy was 97%. Using the incremental feature enrichment from F0 to F2, K10 protocol gave F1 improvement over F0 by 4.98% on Disease dataset, while F2 improvement over F0 was by 11.78% on WebKB4 dataset. We demonstrated the generalization over memorization process in our ML-design. The system was tested for stability and reliability. CONCLUSIONS: We conclude that semantically different aspects of feature selection, when adapted sequentially, leads to improvement in ML-accuracy for healthcare data sets. We validated the system by taking non-healthcare data sets.
BACKGROUND AND OBJECTIVE: Healthcare tweets are particularly challenging due to its sparse layout and its limited character size. Compared to previous method based on "bag of words" (BOW) model, this study uniquely identifies the enrichment protocol and learns how semantically different aspects of feature selection such as BOW (feature F0), term frequency inverse document frequency (TF-IDF, feature F1), and latent semantic indexing (LSI, feature F2) when applied sequentially with classifier improves the overall performance. METHODS: To study this enrichment concept, our ML model is tested on two kinds of diverse data sets: (i) D1: Disease data with conjunctivitis, diarrhea, stomach ache, cough and nausea related tweets, and (ii) D2: WebKB4 dataset, while adapting three kind of classifiers (a) C1: support vector machine with radial basis function (SVMR), (b) C2: Multi-layer perceptron (MLP) and (c) C3: Random Forest (RF). Partition protocol (K10) was adapted with different performance metrics to evaluate machine learning (ML)-system. RESULTS: Using the combination of F1, C1, D1, K10, ML accuracy was: 94%, while with F2, C1, D1, K10, ML accuracy was 97%. Using the incremental feature enrichment from F0 to F2, K10 protocol gave F1 improvement over F0 by 4.98% on Disease dataset, while F2 improvement over F0 was by 11.78% on WebKB4 dataset. We demonstrated the generalization over memorization process in our ML-design. The system was tested for stability and reliability. CONCLUSIONS: We conclude that semantically different aspects of feature selection, when adapted sequentially, leads to improvement in ML-accuracy for healthcare data sets. We validated the system by taking non-healthcare data sets.
Authors: Ankush D Jamthikar; Deep Gupta; Laura E Mantella; Luca Saba; John R Laird; Amer M Johri; Jasjit S Suri Journal: Int J Cardiovasc Imaging Date: 2020-11-12 Impact factor: 2.357
Authors: Ankush Jamthikar; Deep Gupta; Elisa Cuadrado-Godia; Anudeep Puvvula; Narendra N Khanna; Luca Saba; Klaudija Viskovic; Sophie Mavrogeni; Monika Turk; John R Laird; Gyan Pareek; Martin Miner; Petros P Sfikakis; Athanasios Protogerou; George D Kitas; Chithra Shankar; Andrew Nicolaides; Vijay Viswanathan; Aditya Sharma; Jasjit S Suri Journal: Cardiovasc Diagn Ther Date: 2020-08
Authors: Jasjit S Suri; Sushant Agarwal; Gian Luca Chabert; Alessandro Carriero; Alessio Paschè; Pietro S C Danna; Luca Saba; Armin Mehmedović; Gavino Faa; Inder M Singh; Monika Turk; Paramjit S Chadha; Amer M Johri; Narendra N Khanna; Sophie Mavrogeni; John R Laird; Gyan Pareek; Martin Miner; David W Sobel; Antonella Balestrieri; Petros P Sfikakis; George Tsoulfas; Athanasios D Protogerou; Durga Prasanna Misra; Vikas Agarwal; George D Kitas; Jagjit S Teji; Mustafa Al-Maini; Surinder K Dhanjil; Andrew Nicolaides; Aditya Sharma; Vijay Rathore; Mostafa Fatemi; Azra Alizad; Pudukode R Krishnan; Ferenc Nagy; Zoltan Ruzsa; Mostafa M Fouda; Subbaram Naidu; Klaudija Viskovic; Mannudeep K Kalra Journal: Diagnostics (Basel) Date: 2022-06-16