| Literature DB >> 35651746 |
Mauro César Cafundó Morais1,2,3, Diogo Silva3, Matheus Marques Milagre4, Maykon Tavares de Oliveira5, Thaís Pereira6, João Santana Silva5, Luciano da F Costa7, Paola Minoprio2, Roberto Marcondes Cesar Junior8, Ricardo Gazzinelli6, Marta de Lana4,9, Helder I Nakaya1,2,3,10.
Abstract
Chagas disease is a life-threatening illness caused by the parasite Trypanosoma cruzi. The diagnosis of the acute form of the disease is performed by trained microscopists who detect parasites in blood smear samples. Since this method requires a dedicated high-resolution camera system attached to the microscope, the diagnostic method is more expensive and often prohibitive for low-income settings. Here, we present a machine learning approach based on a random forest (RF) algorithm for the detection and counting of T. cruzi trypomastigotes in mobile phone images. We analyzed micrographs of blood smear samples that were acquired using a mobile device camera capable of capturing images in a resolution of 12 megapixels. We extracted a set of features that describe morphometric parameters (geometry and curvature), as well as color, and texture measurements of 1,314 parasites. The features were divided into train and test sets (4:1) and classified using the RF algorithm. The values of precision, sensitivity, and area under the receiver operating characteristic (ROC) curve of the proposed method were 87.6%, 90.5%, and 0.942, respectively. Automating image analysis acquired with a mobile device is a viable alternative for reducing costs and gaining efficiency in the use of the optical microscope. ©2022 Morais et al.Entities:
Keywords: Blood trypomastigote; Machine learning; Parasitemia; SVM; Trypanosoma cruzi
Year: 2022 PMID: 35651746 PMCID: PMC9150695 DOI: 10.7717/peerj.13470
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 3.061
Figure 1T. cruzi detection image analysis pipeline.
(A) Blood smear samples were prepared from mice experimentally infected with T. cruzi parasites at the acute infection stage. (B) Images of thin blood smear slides were acquired with a mobile phone camera attached to a microscope ocular lens. (C) Parasite (trypomatigote forms of T. cruzi) was segmented by a graph-based algorithm. (D) Images were converted to CIEL*a*b* color space and parasite features were extracted and selected (PCA). (E) Objects feature data were split into training and test sets. Four machine learning models were trained and assessed. (F) Parasites were detected in mobile phone camera images.
Figure 2Mobile phone attached to optical microscope objective lens and image acquisitio.
(A) The mobile phone was attached to the microscope ocular lens (eyepiece) with a plastic support device (left). The camera was configured with the macro focus for acquisition. Other configurations were set to automatic. (B) Image of field-of-view from blood smear slide of mice infected with Trypanosoma cruzi. The red squares indicate regions with the presence of a parasite. (C) Crop of a field of view with the T. cruzi parasite at the center. K, kinetoplast; N, nucleus; Scale bar, 10 µm.
Figure 3Object segmentation.
(A) Original image acquired with a mobile phone attached to the microscope. (B) Segmented image with regions highlighted with different colors. Yellow squares indicate the location of the parasite. (C) Segmented parasites in a 100 × 100 pixels2. Top row: T. cruzi trypomastigotes from orignal image. Middle row: segmented regions with parasites. Bottom row: segmented parasites within the segmented region of interest highlighted. Only the regions segmented with the parasites were selected for feature extraction.
Figure 4Parasite’s segmentation and region labeling.
(A) Example of segmented regions that contain the parasite. The segmented regions containing the T. cruzi trypomastigote form were labeled as “parasite”. (B) The segmented regions that do not contain a parasite or that are over-segmented were labeled as “unknown”.
The number of objects by classes used in the training and test sets.
|
|
|
|
|---|---|---|
| Parasites | 1103 | 211 |
| Unknown | 1078 | 236 |
| Total | 2181 | 447 |
Object feature metrics.
|
|
|
|
|---|---|---|
|
|
| |
| Perimeter (P) | Parametric representation of the contour and its points identified by the coordinates | |
| Area (A) | Integral of the contour | |
| Area and perimeter ratio |
| |
| Circularity |
| |
| Thickness ratio |
| |
| Centroid | Given the center of mass M of a contour of complex signal | |
| Centroid to contour maximum distance | The distance between the centroid and the furthest point on the contour. | |
| Centroid to contour minimal distance | The distance between the centroid and the nearest point on the contour. | |
| Centroid to contour average distance | The average of the distances between the centroid and all points in the contour. | |
| Major axis | Pair of more distant points belonging to the object. | |
| Minor axis | Pair of closest points belonging to the object. | |
| Aspect ratio |
| |
| Perimeter and Major axis ratio |
| |
| Bilateral symmetry | Bilateral symmetry is given by the proportion of the number of pixels between the intersection of an object and it’s reflecting shape with respect to the major axis, and the union between those two objects. | |
|
| ||
|
| ||
|
|
| |
|
| ( | |
|
| ( | |
|
| ( | |
|
| ( | |
|
| (3 | |
|
|
| |
| Mean |
| |
| Median |
| |
| Mode | Pixel value that occurs with greatest frequency | |
| Amplitude | max(p) - min(p) | |
| Variance |
| |
|
|
| |
| Bending energy |
| |
| Variance |
| |
| Entropy |
| |
|
|
| |
| Entropy (E) |
| |
| Angular second moment (ASM) |
| |
| Contrast (CON) |
| |
| Inverse difference moment (IDM) |
| |
| Correlation (COR) |
|
Notes.
Refer to Huang & Leng (2010) for µand η equations.
The curvature k(t) of a parametric curve c(t) = (x(t), y(t)) was defined as , where x′(t), y′(t) and x′′(t), y′′(t) are the first and second derivative of the contour signal x(t) and y(t), respectively.
Texture features were extracted based on the color co-occurrence matrix (CCM).
Prediction performance of models on the training and testing sets.
| Set | Feature selection | Model | Metrics (%) | |||||
|---|---|---|---|---|---|---|---|---|
| Sensitivity | Specificity | Precision | Accuracy | F1-score | AUC | |||
| Train | None | SVM | 67.4 | 80.2 | 77.7 | 73.8 | 72.2 | 0.797 |
| KNN | 75.6 | 84.2 | 83.1 | 79.9 | 79.2 | 0.878 | ||
| RF | 99.8 | 99.5 | 99.5 | 99.7 | 99.7 | 1.0 | ||
| Ensemble | 88.6 | 92.0 | 91.9 | 90.3 | 90.2 | 0.978 | ||
| Test | None | SVM | 69.7 | 75.4 | 71.7 | 72.7 | 70.7 | 0.78 |
| KNN | 69.7 | 75.4 | 71.7 | 72.7 | 70.7 | 0.759 | ||
| RF | 90.5 | 88.6 | 87.6 | 89.5 | 89.0 | 0.942 | ||
| Ensemble | 76.8 | 81.8 | 79.0 | 79.4 | 77.9 | 0.884 | ||
Notes.
Area under the curve
Support vector machines
k-nearest neighbors
random forest
Figure 5Model classification performance.
(A) Receiver operating characteristic (ROC) curve in the training set. (B) ROC curve in the testing set. AUC, area under the curve.
Confusion matrix of the voting classifier (ensemble’s prediction) in the test set.
| True label | |||
|---|---|---|---|
| Parasite | Unknown | ||
| Predicted label | Parasite | 162 | 43 |
| Unknown | 49 | 193 | |
Confusion matrix of the Random Forest classification model in the test set.
| True label | |||
|---|---|---|---|
| Parasite | Unknown | ||
| Predicted label | Parasite | 191 | 27 |
| Unknown | 20 | 207 | |
Figure 6Sample images of false-positive and false-negative Chagas parasite detection algorithm.
(A) The regions of the images with leukocytes (left) or high density of red cells (right) showed overstained areas that made it difficult to properly classify by the algorithm. (B) Parasites in regions of the image that presented low contrast (left) or low sharpness (right) were not recognized by the algorithm.
Comparison of the results of our algorithm with other published studies.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| The present work |
| Mobile phone camera | RF | 90.5 | 88.6 |
|
|
| Dedicated camera | Gaussian discriminant | 98.3 | 84.4 |
|
|
| Dedicated camera | AdaBoost + SVM | 100 | 93.2 |
|
|
| Dedicated camera | Convolutional Neural Network | 97.6 | 95.2 |
|
|
| Dedicated camera | KNN | 98 | 85 |
|
| Dedicated camera | SVM | 96.3 | 99.1 | |
|
| Mobile phone camera | Convolutional Neural Network | 92.6 | 94.3 | |
|
| Mobile phone camera | Convolutional Neural Network | 94.5 | 96.9 | |
|
| Mobile phone camera | SVM | 80.5 | 93.8 | |
|
| Mobile phone camera | Adaboost | 59 | 95 |
Notes.
random forest
support vector machine
k-nearest neighbours