| Literature DB >> 34658617 |
Shallu Kotwal1, Priya Rani2, Tasleem Arif1, Jatinder Manhas3, Sparsh Sharma4.
Abstract
Bacteria are important in a variety of practical domains, including industry, agriculture, medicine etc. A very few species of bacteria are favourable to humans. Whereas, majority of them are extremely dangerous and causes variety of life threatening illness to different living organisms. Traditionally, this class of microbes is detected and classified using different approaches like gram staining, biochemical testing, motility testing etc. However with the availability of large amount of data and technical advances in the field of medical and computer science, the machine learning methods have been widely used and have shown tremendous performance in automatic detection of bacteria. The inclusion of latest technology employing different Artificial Intelligence techniques are greatly assisting microbiologist in solving extremely complex problems in this domain. This paper presents a review of the literature on various machine learning approaches that have been used to classify bacteria, for the period 1998-2020. The resources include research papers and book chapters from different publishers of national and international repute such as Elsevier, Springer, IEEE, PLOS, etc. The study carried out a detailed and critical analysis of penetrating different Machine learning methodologies in the field of bacterial classification along with their limitations and future scope. In addition, different opportunities and challenges in implementing these techniques in the concerned field are also presented to provide a deep insight to the researchers working in this field. © CIMNE, Barcelona, Spain 2021.Entities:
Year: 2021 PMID: 34658617 PMCID: PMC8505783 DOI: 10.1007/s11831-021-09660-0
Source DB: PubMed Journal: Arch Comput Methods Eng ISSN: 1134-3060 Impact factor: 7.302
Fig. 2Flow chart with basic architecture of automatic bacterial image classification system
Fig. 1Article overview
keywords based journals selection
| Resources | Keywords used | Journals referred |
|---|---|---|
| Springer link | Bacteria, Mycobacterium tuberculosis, Food borne pathogens, Machine learning OR Deep learning, Classification OR Classify OR Classifier | Signal, Image and Video Processing Journal of Food Measurement and Characterization Image Analysis and Processing-ICIAP Applied Microbiology and Biotechnology Intelligence in Big Data Technologies-Beyond the Hype Information System Frontiers |
| Elsevier (Science Direct) | Bacteria, Mycobacterium tuberculosis, Food borne pathogens, Machine learning OR Deep learning, Classification OR Classify OR Classifier | Ecological informatics Biocybernetics and biomedical engineering Real time Imaging Infrared Physics and Technology Sensors and Actuators B: Chemical |
| IEEE | Bacteria, Mycobacterium tuberculosis, Foodborne pathogens, Machine learning OR Deep learning, Classification OR Classify OR Classifier | IEEE Access |
| Plos | Bacteria, Mycobacterium tuberculosis, Foodborne pathogens, Machine learning OR Deep learning, Classification OR Classify OR Classifier | Computational Biology PLOS one |
List of IEEE Conferences
| International Conference on Signal and Image Processing (ICSIP) |
| International Conference on Electrical, computer and Communication Engineering (ECCE) |
| International Seminar on Research of Information Technology and Intelligent System (ISRITI) |
| 10th International Conference on Electrical and Computer Engineering |
| IEEE International Conference On Engineering and Technology (ICETECH) |
| IEEE Journal of Biomedical and health informatics |
| International Conference on Electrical Engineering and Informatics |
| IEEE Transactions on Information Technology in Biomedicine |
| International Conference on Computer Science and Software Engineering |
| 9th Cairo International Biomedical Engineering Conference (CIBEC) |
| IEEE International Conference on Imaging System and Techniques (IST) |
| 10th International Conference on Intelligent System Design and Applications |
Fig. 3Yearly distribution of selected articles
A brief summary of different ML approaches used in classification of bacteria (1998 to 2010)
| Author/ Year | Methodology/Technique | Types of Bacteria | Feature Selection | Dataset | Results | Limitations | Future Scope | |
|---|---|---|---|---|---|---|---|---|
| Dataset used | Dataset details | |||||||
| Holmberg et al. [ | ANN | Urinary bacteria | Shape features | C = 5 T = 100 | Acc = 76% | Details of Dataset are incomplete | In future, model will take limited parameters and large amount of data | |
| Veropoulos et al. [ | NN, Back propogation | Tuber-culosis bacilli | Shape features | T = 1147 Tr = 1000 Te = 147 | Acc = 97.9% Se = 94.1% Sp = 99.1% | Evaluation detail is less | This model may combine with other diagnostic technique that use automatic microscope to reduce the cost effect | |
| Liu et al. [ | KNN | Different Bacteria species | Shape features | C = 11 T = 4270 | Acc = 97% | Overlapping species Images | New version of CMEIAS may be developed with additional features | |
| Forero et al. [ | K-means Clustering | Tuberculosis bacteria | Shape features | C = 8 T = 100 | Sp = 93.54% Se = 100% | Dataset is small | Bayesian decision theory along with Gaussian mixture model may be used and dataset will also be extended | |
Xiaojuan et al. [ | Back propogation, NN | Different bacteria species | Shape and Texture features | C = 4 Tr = 60 Te = 20 | Acc = 86.3% | Evaluation details are not described | A hybrid model may be proposed for bacteria classification | |
Men et al. [ | SVM | Heterotrophic bacteria colonies | Shape features | C = 2 T = 300 | Acc = 98.7% | Types of species in images are less explained | A hybrid model may be proposed to classify bacteria colony | |
Chen et al. [ | SVM and Radial basis function | Oral cavity bacteria | Color features | C = 2 T = 100 | Acc = 96% Pre = 0.97 + -0.03 Re = 0.96 + -0.04 | Clustered colonies are not distinguished | Model may be improved to detect and distinguish different species of bacteria and clustered colonies | |
Xiaojuan et al. [ | NN, Back propagation | Wastewater bacteria | Shape features | C = 4 T = 16 Tr = 10 Te = 6 | Acc = 85.5% | Evaluation details are not described | Model may be upgraded so as to get better result and classify other bacteria also | |
Osman et al. [ | NN | Tuberculosis bacteria | Texture features | T = 680 Tr = 400 Te = 280 | Acc = 89.64% | Detail of the given dataset is incomplete | More features may be added to improve the performance and analysis | |
Khultang et al. [ | KNN, SVM, PNN | Mycobacterium tuberculosis | Fisher’s mapped features | T = 11,259 Tr = 6901 Te = 4358 | Acc = 95% | Evaluation details are not described | Feature set may be extended to include other bacilli classes also | |
| Hiremanth et al. [ | K-NN, NN | Cocci bacterial cells | Geometric features | T = 500 | Acc = 96%(KNN) Acc = 98%(ANN) | Detail of the dataset is incomplete | Given method may be modified by using better pre-processing techniques and feature sets | |
Akova et al. [ | SVDD | Bacteria Serovars | Texture feature | C = 5 T = 2054 | Acc = 82% | Evaluation details are not described | It may use Bayesian approaches to make robust modeling and improve prediction | |
A brief summary of different ML approaches used in classification of bacteria (2011 to 2020)
| Author/year | Methodology/technique | Types of bacteria | Feature selection | Dataset | Results | Limitations | Future Scope | |
|---|---|---|---|---|---|---|---|---|
| Dataset used | Dataset details | |||||||
Rulaningtyas et al. (2011) [ | NN | Tuberculosis bacteria | Shape features | T = 100 Tr = 75 Te = 25 | Mean square error = 0.000368 | Model was not tested on real images | Automated tool to count and analyze tuberculosis bacteria may be developed | |
Hiremanth et al. (2011) [ | K-NN, NN | Cocci bacteria cells | Geometric and statistical features | T = 350 | Acc = 99% K-NN = 100% ANN = 98% | Dataset size is small | Better preprocessing methods and feature sets may be used to get improved results | |
Ahmed et al. [ | SVM | Food bacteria | Shape and texture features | T = 1000 | Acc = 90% | Detail of dataset is not elaborative | Dataset may be enhanced to improve the speed of classification | |
| Chayadevi et al. [ | K-means, SOM | Bacteria clusters from microscopic images | Shape features | T = 320 | – | High computation cost | Hybrid approach may be used to reduce computational time and cost | |
Ferrari et al. [ | SVM with radial basis function | Bacterial colonies | Shape features | C = 6 T = 22 | Acc = 93% | Detail of dataset is not elaborated | Large dataset may be used to improve results | |
Ayas et al. [ | RF | Mycobacterium tuberculosis bacteria | Shape features | T = 116 Tr = 40 Te = 76 | Sp = 62.89% Se = 89.34% | Dataset size is small | This model may be extended to identify the bacteria on the basis of other features | |
| Govindan et al. [ | SVM | Tuberculosis bacteria | Shape features | T = 50 | Se = 72.89% | Detail of dataset and result is incomplete | Automated tool in identification of other types of bacteria such as cocci and vibrio may be developed | |
Nie et al. [ | CNN | Bacterial colonies | CNN features | T = 862 | Acc = 62.10% Pre = 83.76% Re = 82.16% | Less efficient in classifying bacterial colony images | Methods for training models that label patches spanning multiple heterogenous colonies may be explored | |
Seo et al. [ | SVM and PLS-DA | Staphylococcus species | Spectral features | C = 5 | Acc = 97.8% | Detail of dataset is incomplete | The dataset may be increased so as to validate the results on larger testing dataset | |
Priya et al. [ | SVM, BPNN | Tuberculosis bacteria | Shape features | T = 100 | Acc(BPNN) = 82.5% Acc(SVNN) = 92.5% | Details of dataset are incomplete and fewer features were used | More features may be used to improve the classification results | |
Lopez et al. [ | CNN | Mycobacterium tuberculosis bacteria | CNN features | T = 492 | Acc = 96% | Evaluation details are not described | Model for classifying other bacteria may be proposed | |
Turra et al. [ | CNN, SVM, RF | Hyperspectral bacterial images | Shape features | C = 8 T = 16,642 colonies | Acc(CNN) = 99.7% Acc(SVM) = 99.5% Acc(RF) = 93.8% | Very complex dataset and cost of Hyperspectral imaging is high | Higher number of UTI-relevant pathogens and clinical laboratory validations may be used | |
Zielinski et al. [ | CNN, SVM, RF | Bacteria colonies | CNN features | C = 33 T = 660 | Acc = 97.24% | When database size is increased, the accuracy decreases | The DIBaS dataset may be extended along with improved investigation method | |
| Mohamed et al. [ | SVM | Bacteria species | SURF description | C = 10 T = 200 | Acc = 97% | Detail of dataset is incomplete | Experiments may be carried out on increased species of the bacteria | |
Panicker et al. [ | CNN | Tuberculosis bacteria | CNN features | T = 120 | Rec = 97.13% Prec = 78.4% F-Score = 86.76% | Dataset is small | Dataset may be enhanced for getting better results | |
Wahid et al. [ | CNN | Pathogenic bacteria species | CNN features | C = 5 T = 500 Tr = 400 Te = 100 | Acc = 95% | Numbers of species are less | Number of species may be increased for classification | |
Traore et al. [ | CNN | Vibrio cholera and plasmodium falciparum | CNN features | C = 2 T = 480 Tr = 400 Te = 80 | Acc = 94% | Dataset is small | The model may be optimized to get better results | |
Rahmayuna et al. [ | SVM | Pathogenic bacteria | Texture features | C = 4 T = 600 Tr = 540 Te = 60 | Acc = 90.33% | Features were Selected manually | Automatic feature selection may be implemented | |
Hay et al. [ | CNN | Larval zebra fish intestinal bacteria | Texture features | T = 1190 | Acc = 89.3% | Dataset is small | Dataset may be enhanced for better performance | |
Mithra et al. [ | Deep belief neural network | Tuberculosis bacteria | Pixel intensity based features | T = 500 Tr = 275 Te = 225 | Acc = 97.55% Se = 97.86% Sp = 98.23% | Detail of dataset is incomplete | Model for classifying other bacteria may be proposed | |
Treebupachatsakul et al. [ | CNN | Staphylococcus aureus and lactobacillus delbrueckii | CNN features | C = 2 T > 400 | Acc = 75% | Evaluation details are not described | The LeNET architecture may be modified to train it on larger number of epochs | |
Ahmed et al. [ | SVM | Pathogenic bacteria | CNN features | C = 7 T = 800 | Acc = 96% | Classifying images with multiple bacterial species is very less efficient | RNN model may be combined with CNN model to create more powerful network architecture | |
Alhalem et al. [ | CNN with random projection | Phyla bacteria (16S) | CNN features | C = 3 T = 2000 | Acc = 97% | Detail of dataset is incomplete | CNN model may be proposed for classification of other bacteria | |
Bonah et al. [ | SVM, LDA, genetic algorithm | Food-borne pathogens | Pixel based features | C = 8 T = 320 Tr = 192 Te = 128 | Acc = 99.47% | Very complex dataset and cost of Hyperspectral imaging is high | An enhanced model may be proposed to classify other bacteria | |
Kang et al. [ | Deep learning with LSTM, ResNet, 1D-CNN | Food-borne pathogens | Shape features | C = 5 T = 5000 Tr = 3600 Val = 900 Te = 500 | Acc(LSTM) = 92.2% Acc (ResNet) = 93.8% Acc(1D-CNN) = 96.2% | Very complex dataset and cost of Hyperspectral imaging is high | Method may be proposed to practically implement testing of food matrix at food industry and clinical laboratory | |
Kang et al. [ | Deep learning with KNN, SVM, 1D-CNN | Food-borne pathogens | Shape features | C = 5 T = 1500 Tr = 1000 Te = 500 | Acc = 90%(1D-CNN) Acc = 81%(KNN) Acc = 81%(SVM) | Very complex dataset and cost of Hyperspectral imaging is high | Method may be proposed to classify other types of bacteria | |
| Mhathesh et al. [ | CNN | Larval zebra fish intestine bacteria | Texture features | C = 1 | Acc = 95% | Detail of dataset is incomplete | A CNN based classifier may be proposed to classify different biological images with more dimensions | |
H.Sajedi et. al. (2020) [ | XGBoost | Myxobacteria | Texture features | – | Acc = 90.28% | Details of dataset is incomplete | More techniques can be applied for better result | |
Abbreviations used in the Tables 3 and 4—Accuracy (Acc), Classes (C), Total Images (T), Training (Tr), Testing (Te), Sensitivity (Se), Specificity (Sp), Precision (Pre), Recall (Re), Validation (Val)
Fig. 4Different number of Research papers employed similar techniques
performance evaluation of ML techniques with highest Accuracy
| Machine learning techniques | Performance metrics (%) | Types of feature set |
|---|---|---|
| Artificial Neural Network | Accuracy = 98 | Geometric features |
| K-Nearest Neighbor | Accuracy = 100 | Geometric and statistical features |
| Support Vector Machine | Accuracy = 99.5 | Shape features |
| Random forest | Accuracy = 93.8 | Shape features |
| Deep learning | Accuracy = 99.7 | Deep features |
| K-means | Accuracy = 93.54 | Shape features |