Bidur Khanal1,2, Pravin Pokhrel1, Bishesh Khanal2, Basant Giri1. 1. Center for Analytical Sciences, Kathmandu Institute of Applied Sciences, Kathmandu 44600, Nepal. 2. Nepal Applied Mathematics and Informatics Institute for Research, Kathmandu 44600, Nepal.
Abstract
Paper-based analytical devices (PADs) employing colorimetric detection and smartphone images have gained wider acceptance in a variety of measurement applications. PADs are primarily meant to be used in field settings where assay and imaging conditions greatly vary, resulting in less accurate results. Recently, machine-learning (ML)-assisted models have been used in image analysis. We evaluated a combination of four ML models-logistic regression, support vector machine (SVM), random forest, and artificial neural network (ANN)-as well as three image color spaces, RGB, HSV, and LAB, for their ability to accurately predict analyte concentrations. We used images of PADs taken at varying lighting conditions, with different cameras and users for food color and enzyme inhibition assays to create training and test datasets. The prediction accuracy was higher for food color than enzyme inhibition assays in most of the ML models and color space combinations. All models better predicted coarse-level classifications than fine-grained concentration classes. ML models using the sample color along with a reference color increased the models' ability to predict the result in which the reference color may have partially factored out the variation in ambient assay and imaging conditions. The best concentration class prediction accuracy obtained for food color was 0.966 when using the ANN model and LAB color space. The accuracy for enzyme inhibition assay was 0.908 when using the SVM model and LAB color space. Appropriate models and color space combinations can be useful to analyze large numbers of samples on PADs as a powerful low-cost quick field-testing tool.
Paper-based analytical devices (PADs) employing colorimetric detection and smartphone images have gained wider acceptance in a variety of measurement applications. PADs are primarily meant to be used in field settings where assay and imaging conditions greatly vary, resulting in less accurate results. Recently, machine-learning (ML)-assisted models have been used in image analysis. We evaluated a combination of four ML models-logistic regression, support vector machine (SVM), random forest, and artificial neural network (ANN)-as well as three image color spaces, RGB, HSV, and LAB, for their ability to accurately predict analyte concentrations. We used images of PADs taken at varying lighting conditions, with different cameras and users for food color and enzyme inhibition assays to create training and test datasets. The prediction accuracy was higher for food color than enzyme inhibition assays in most of the ML models and color space combinations. All models better predicted coarse-level classifications than fine-grained concentration classes. ML models using the sample color along with a reference color increased the models' ability to predict the result in which the reference color may have partially factored out the variation in ambient assay and imaging conditions. The best concentration class prediction accuracy obtained for food color was 0.966 when using the ANN model and LAB color space. The accuracy for enzyme inhibition assay was 0.908 when using the SVM model and LAB color space. Appropriate models and color space combinations can be useful to analyze large numbers of samples on PADs as a powerful low-cost quick field-testing tool.
Paper-based analytical
devices (PADs) have gained wider acceptance
in clinical diagnosis, environmental pollution, food quality monitoring,
and pharmaceutical quality screening among many other applications.
Assays involving PADs are less costly, easy to use, and are considered
as point-of-need assays.[1−5] Electrochemical and optical detection methods are primarily used
to record the assay signal on PADs.[6] Because
of the proliferation of digital cameras, particularly smartphone cameras,
the digital image-based colorimetric detection method is one of the
widely used methods where color information encoded in the digital
image is used for the analysis of assay results.[3] Smartphone image-based colorimetric detection has been
considered as a cost-effective and an attractive field-based alternative
to conventional techniques such as spectrophotometry, colorimetry,
and fluorometry.[7,8]Digital cameras use multiple
charge-coupled devices (CCD) or complementary
metal-oxide semiconductor (CMOS) sensors to capture the light intensity
signal separately from red (R), green (G), and blue (B) using a mosaic-patterned
filter array. The signals are then combined using demosaicing, resulting
in three color values R, G, and B at each pixel of the digital image.[9] The image formation process in digital cameras
is nonlinear. The raw signal or the intensity value at each pixel
of the imaged area depends on the lighting condition, sensor sensitivity,
distance between the object and camera, and reflectance property of
the object being imaged.[10,11] Some of these variations
such as the lighting condition and object-camera distance can be minimized
using a controlled environment that is only possible in laboratory
settings.[12−15] However, PADs are ultimately meant to be used in field settings
by a minimally trained user, during which the criteria of controlled
assay and imaging conditions may not be achieved, resulting in less
accurate results.Constant illumination may be possible to maintain
while imaging
the assay signal by attaching an extra device on smartphones with
added cost. The extra device may not be applicable in all types of
smartphones having diverse shapes and sizes.[16] Another approach uses a blank or reference assay along with the
sample assay simultaneously to factor out the impact of illumination
and camera quality changes.[10,17] Because the raw signals
captured by the camera sensors process nonlinearly during image acquisition
before saving to the memory, the abovementioned approaches only partially
address the problem. Furthermore, image autocorrection options such
as automatic exposure correction, color correction based on ambient
light selection, white balancing, and contrast enhancement can highly
influence the overall color calibration.[16,18] Thus, the estimation of the analyte concentration from the color
intensity in PADs is an inverse problem where any estimation models
will have explicit or implicit assumptions on the image formation
process. The nonlinear factors need to be accounted for to make a
robust and reliable low-cost model applicable in diverse point-of-need
settings.Traditional computer vision and ML models have been
used for image
processing to enhance the robustness of the PAD assay. Traditional
computer vision-based algorithms try to be invariant to illumination,
scale, and camera.[10] Most of these algorithms
use color spaces including the red, green, and blue (RGB); hue, saturation,
and value (HSV); hue, saturation, and lightness (HSL); and luminance,
a (green to red), and b (blue to yellow) (CIE L*a*b* or LAB).[10,16,18] Each particular application usually
requires careful selection of the color space model based on preliminary
data and experiments. Similarly, other corrections such as white balance
correction, contrast transfer, and gamma correction have been used.
These corrections are specific for individual cameras. The camera-specific
corrections are not practical when the goal is to enable low-cost
colorimetry to the large variety of consumer cameras.In recent
years, data-driven ML algorithms are becoming increasingly
common for colorimetric detection.[19,20] ML has the
potential to be robust against unwanted variations. It does not need
an explicitly designed algorithm to extract information. The model
learns from data to work in diverse environmental settings.[21] Bao et al. trained support vector machines (SVM)
on RGB color channels in multiple indoor settings using a single camera.[22] Similarly, Solmaz et al. used least-squares
SVM (LS-SVM) and a multiclass random forest (RF) classifier to predict
the peroxide content on colorimetric test strips and reported over
90% accuracy for six classes with interphone repeatability under variable
illumination.[23] Kim et al. applied linear
discriminant analysis (LDA), SVM, and artificial neural network (ANN)
for the colorimetric analysis of saliva–alcohol concentrations,
with average cross-validation accuracy rates of 100 and 80% for five
standard and nine increased concentrations.[19] Even though a few studies have reported the use of ML in image-based
colorimetric detection, they lack insights into the relative efficacy
of ML algorithms and their generalization capabilities. Generalization
is an important issue in ML, where a model’s high performance
in training and validation data degrades in test data with different
distributions. As reported by Morbioli et al.,[24] source codes and data used in most of the proposed ML models
for colorimetric detection with PADs are not publicly available, making
it difficult to reproduce results and perform benchmark comparisons.In this work, we designed a set of comprehensive experiments to
analyze the performance and utility of ML for colorimetric image analysis
of PADs using two different data sets, food color and enzyme inhibition
assays, for pesticide residue determination. We assessed four different
ML models—logistic regression (LR), SVM, RF, and ANN—and
three image color spaces—RGB, HSV, and LAB—to predict
the target analyte concentration using images of PADs taken at varying
lighting conditions, with different cameras and users. The colorimetric
assays involved in our approach included both samples and reference
assay zones on the paper device in contrast to most of the previous
studies that captured only the target sample assay zone. In this setting,
we obtained the concentration class prediction accuracy of 0.966 and
0.908 for food color and pesticide residue analysis datasets, respectively,
in independently created test datasets. We also highlight the limitations
of all ML models when there is a domain shift in test data and their
inability to predict with the same accuracy at all concentration levels.
We have made our source code and data publicly available to contribute
to the reproducibility issues in this rapidly progressing field.
Results
and Discussion
We established a baseline result using the
mean color intensity
as the input feature to ML models. Cross-validation results for various
color spaces and ML models using mean pixel values of the ROIs as
input features to classify into 10 concentration classes are shown
in Figure . At first,
we tested the models using test zones of samples only. In general,
the RF model yielded higher accuracy in all three-color spaces in
the case of food color experiments. In this case, the HSV color space
had a higher (0.691) value than LAB (0.669) and RGB (0.588). The LR
and SVM models exhibited lower accuracy compared to the RF model.
In both LR and SVM models, all three-color spaces had similar values
compared to RF. We also looked at the ANN model’s ability to
predict correct assay values. It gave comparable values with the LR
and SVM models, but the values for three different color spaces exhibited
some variations. In the case of pesticide assay, the accuracy was
less than that seen in food color experiments for all combinations
of models and color spaces, as shown in Figure . However, the trend of accuracy results
was similar to that of food color assay.
Figure 1
Cross-validation accuracy
results obtained for both food color
prediction (left panel) and pesticide assay (right panel). Each box
describes the full range of variation (whisker’s height), the
likely range of variation (box height), and the median (horizontal
line within the box) in the classification accuracy score of five
cross-validation folds. All results in this figure were obtained using
the mean of each color channel value of sample assays without (top)
and with (bottom) reference assay test zones.
Cross-validation accuracy
results obtained for both food color
prediction (left panel) and pesticide assay (right panel). Each box
describes the full range of variation (whisker’s height), the
likely range of variation (box height), and the median (horizontal
line within the box) in the classification accuracy score of five
cross-validation folds. All results in this figure were obtained using
the mean of each color channel value of sample assays without (top)
and with (bottom) reference assay test zones.
Combining
the Sample and Reference Assay Color
To improve
the prediction accuracy, we took into consideration both the sample
and reference assay colors. In these experiments, we investigated
if imaging both the reference or control assay and sample assay at
the same time could improve the prediction accuracy by ML models.
The latter approach showed an improvement in the prediction accuracy
of all the ML models when the mean color intensity from the reference
region was included as the feature (Figure ). Figure suggests that the ML model was able to use information
from the reference sample region to partially factor out the variations
due to ambient lighting conditions and camera parameters and the image
acquisition setup, improving the concentration prediction results
when using the reference and sample compared to using only the target
sample. Using a printed reference color instead of a reference assay
performed on the same paper device may not correct the variation resulting
from the assay procedure. The relationship between the reference and
sample color could be a simple difference or an n-degree polynomial.
Figure 2
Confusion matrix (of the test accuracy of LR) for the
food color
and pesticide assay when RGB, LAB, and HSV color models are used.
The confusion matrix of other models also shows similar patterns.
Confusion matrix (of the test accuracy of LR) for the
food color
and pesticide assay when RGB, LAB, and HSV color models are used.
The confusion matrix of other models also shows similar patterns.The cross-validation accuracy using the RGB color
space for the
food color assay was found to be higher in LR, SVM, and ANN models,
while it was lower in the RF model when compared to the LAB and HSV
color spaces. The HSV color space in RF and the RGB color space in
ANN models showed higher accuracy. However, we did not find any specific
trends among color spaces and models used. Similar results were observed
for the pesticide assay dataset. It is interesting to note that the
HSV color space using RF showed the highest average accuracy in comparison
to other color models (i.e., 0.804 in the food color assay and 0.596
in the pesticide assay). In LR, the highest average accuracy was observed
in the RGB color model (0.684 in the food color assay and 0.406 in
the pesticide assay). Similarly, in SVM, the highest average accuracy
was in LAB with 0.68 for the food color assay and 0.42 for the pesticide
assay. For ANN, the highest average cross validation (CV) accuracy
was in RGB, that is, 0.721 for the food color assay and 0.401 for
the pesticide assay.Along with cross-validation experiments,
we also evaluated all
the models and color spaces in a separate test dataset, the details
of which are provided in Table S1. As shown
in Table S1, the highest test accuracy
was 0.67 with the HSV color space in SVM for the food color assay
and 0.34 with the HSV color space in ANN for the pesticide assay (in
ANN with HSV). The results, as expected, showed that test results
did not necessarily agree with the cross-validation accuracy. A large
drop in test accuracy was observed across all the models compared
to the cross-validation results (Tables S1 and S2). Here, the HSV color model showed good results in both
sets of assays. Our results highlight that reporting only cross-validation
scores or scores in test data that are very similar to the training
set can overestimate ML model’s performance. ML models’
performance can severely degrade when the statistical distribution
of the test data is different from that of the training set. This
is closer to the real-world scenario we wanted to emulate to further
assess the efficacy and applicability of ML systems when test images
are captured in different field settings. When comparing the results
across food color and pesticide assays, we observed that the accuracy
in the pesticide assay is relatively lower than that of the food color
assay for all the models. We attribute this observation to the chemistry
of the pesticide assay. Unlike the food color assay, the final color
in the pesticide assay is obtained by an enzyme inhibition reaction.
The enzyme reaction varies with ambient environmental conditions.
Such variation in assay temperature and moisture can result in inconsistent
color development on the surface of paper devices.The confusion
matrix shown in Figure shows that the food color assay has its
values clustered near the diagonal line, except for classes 8, 9,
and 10. These classes correspond to low concentration values with
faint colors (best viewed in the color image). For the pesticide assay,
the values in the initial classes (1, 2, 3, and 4) and final classes
(8, 9, and 10) are misclassified in large numbers. The matrix fields
in the middle, although not accurate, show a diagonal pattern. This
result seems to follow a typical S-shaped enzyme assay curve.[25]In such cases, the ML models can utilize
information from the reference
region to partially factor out the variations of ambient lighting
conditions and the image acquisition setup and learn to estimate the
concentration level by looking at the differences in the signal color
of the two regions. Using a printed reference color instead of a reference
assay performed on the same paper device may not correct the variation
resulting from the assay procedure. The relationship between the reference
and sample color could be a simple difference or an n-degree polynomial.
Input Feature Vectors from the Downsampled
Image
Figure shows the CV accuracy
using all the pixels of downsampled ROIs as a feature vector for ML
models. Because images containing both sample and reference colors
provided better accuracy than using the sample color only, we used
the former approach in this experiment. CV accuracy is generally higher
when using all pixels from a downsampled image as input features compared
to when using color channel means. The improvement in the accuracy
was observed in most of the models and color spaces. However, the
extent of improvement varied with the models and color spaces tested.
A similar trend was observed both in the food color assay and pesticide
assay. LR and SVM models with RGB and LAB color spaces resulted in
the highest accuracies. For LR, CV accuracies were 0.975 and 0.977
when using RGB and LAB, respectively. Likewise, for SVM, the CV accuracy
was 0.971 when using RGB or LAB. When the test dataset was evaluated,
the test accuracy was lower in both food color and pesticide assays.
However, the gap between cross-validation and test accuracy in pesticide
is higher for the pesticide assay in all the color channels and models
(Table S1).
Figure 3
Cross-validation accuracy
using all the pixel values from 16 ×
16 downsampled images of reference and sample as the input features.
Each box describes the full range of variation (whisker’s height),
the likely range of variation (box height), and the median (horizontal
line within the box) in the accuracy score of five cross-validation
folds. LR, SVM, RF, and ANN are implemented in three color models:
RGB, LAB, and HSV.
Cross-validation accuracy
using all the pixel values from 16 ×
16 downsampled images of reference and sample as the input features.
Each box describes the full range of variation (whisker’s height),
the likely range of variation (box height), and the median (horizontal
line within the box) in the accuracy score of five cross-validation
folds. LR, SVM, RF, and ANN are implemented in three color models:
RGB, LAB, and HSV.
Classification into High,
Medium, and Low
The results
in the previous section show that accurately and robustly estimating
fine-grained concentration classes is difficult even with powerful
ML models using 10 concentration classes. Therefore, we merged 10
concentration classes into three distinct classes: high, medium, and
low for semiquantitative prediction of both the food dye and pesticide
samples. Classes 1, 2, and 3 shown in Figure C were merged into the high category, 4,
5, 6, and 7 were merged into the medium category, and 8, 9, and 10
were merged into the low category in the case of the food color assay.
In the case of the pesticide assay, classes 1, 2, 3, and 4 in Figure C were merged into
high, 5 and 6 into medium, and 7, 8, 9, and 10 into low categories.
Figure 4
(A) Fabrication
of the paper device. Solid wax was printed on Whatman
filter paper, which penetrated through the paper after heating (see
inset illustrations). (B) General procedure for paper-based pesticide
assay. (C) Representative images of paper device test zones after
assays were performed. Ten different dilutions of the food color and
10 different concentrations of the pesticide were used. Because the
pesticide assay followed enzyme inhibition reaction, the concentrations
and color intensity are inversely correlated. (D) Automatic color
pixel extraction procedure from the PADs: left to right represent
binary thresholding, mask generation, and masked region of interest
(ROI).
(A) Fabrication
of the paper device. Solid wax was printed on Whatman
filter paper, which penetrated through the paper after heating (see
inset illustrations). (B) General procedure for paper-based pesticide
assay. (C) Representative images of paper device test zones after
assays were performed. Ten different dilutions of the food color and
10 different concentrations of the pesticide were used. Because the
pesticide assay followed enzyme inhibition reaction, the concentrations
and color intensity are inversely correlated. (D) Automatic color
pixel extraction procedure from the PADs: left to right represent
binary thresholding, mask generation, and masked region of interest
(ROI).Figure shows the
overall CV accuracy in food dye and pesticide datasets when they were
reduced into three classes of high, medium, and low. We found that
the food color dataset when individual means of three-color channels
were used as input features showed similar accuracy values with all
four models and all three-color spaces. CV accuracy values for pesticide
assay with mean input features produced similar results as with the
food color assay but slightly lower values. Most of the models and
color spaces in both food color and pesticide assays produced better
accuracies when using all the pixels from the downsampled image as
input features.
Figure 5
Cross-validation accuracy for three reduced classes of
high, medium,
low. Results of both the mean color and 16 × 16 downscaled RGB
pixels are presented for food color (left panel) and pesticide assays
(right panel).
Cross-validation accuracy for three reduced classes of
high, medium,
low. Results of both the mean color and 16 × 16 downscaled RGB
pixels are presented for food color (left panel) and pesticide assays
(right panel).To emulate a realistic setting
and test generalization capability,
we evaluated all the models using the test dataset. We reported the
test accuracies for the case of three classes, as shown in Table . Similar to the CV
accuracies for 10 classes, the test accuracies for 10 classes are
poor compared to three classes (see Table S1). Table shows the
average test accuracies, where we see that, in general, the food color
dye dataset exhibited higher classification accuracy compared to the
pesticide dataset. For the food color assay, we observe the highest
accuracy of 0.966 in ANN with the LAB color space and input feature
from all color pixels of the downsampled image. However, the models
for pesticide concentration prediction did not benefit much by using
all the pixels as input features. For the pesticide assay, the highest
accuracy of 0.908 was obtained in SVM with the LAB color space and
the input feature from the individual means of each color channel.
The obtained results show that accurate semiquantitative prediction
of concentration from PAD images is possible even in an uncontrolled
setup with enough data and a suitable ML model.
Table 1
Average Test Accuracy Using Three
Concentration Classes (High, Medium, and Low)a
(A) 16
× 16 image as the input feature
food
dye
pesticide
RGB
HSV
LAB
RGB
HSV
LAB
LR
0.940
0.921
0.946
0.753
0.637
0.743
SVM
0.936
0.926
0.938
0.738
0.674
0.774
RF
0.901
0.933
0.940
0.865
0.875
0.870
ANN
0.960
0.946
0.966
0.851
0.8336
0.778
(B) means of
individual
R, G, and B channels as the input feature
LR
0.933
0.936
0.928
0.900
0.873
0.895
SVM
0.936
0.941
0.936
0.903
0.883
0.908
RF
0.898
0.915
0.928
0.837
0.855
0.852
ANN
0.893
0.868
0.926
0.866
0.868
0.901
(A) All color pixels from the three
channels of downsampled 16 × 16 image as the input feature and
(B) individual mean of each color channel (3 means) as the input feature.
All images included sample and reference assays.
(A) All color pixels from the three
channels of downsampled 16 × 16 image as the input feature and
(B) individual mean of each color channel (3 means) as the input feature.
All images included sample and reference assays.
Conclusions
We
evaluated four ML models for their ability to accurately predict
the concentration of target analytes on the paper device platform.
We found that the ML models that used the sample color along with
a reference color increased the models’ ability in predicting
the result. The reference assay color may provide a one-point calibration
to estimate or predict the concentration of analytes in the given
sample. In general, we found that the accuracy of the food color assay
was higher than the accuracy of the pesticide assay in most of the
combinations. Our results show that ML models may provide only limited
accuracy when using fine-grained estimation of concentration classes
but provide high accuracy when using them for a coarse-level classification
such as low, medium, and high. The ability of ML models to accurately
classify the pesticide concentration to such three classes even in
difficult real-life test images shows the potential of using ML-powered
PADs as a low-cost quick field-testing method.Smartphone cameras
allow for postprocessing even before we save
or see images. Because it is very hard to understand and identify
these individual preprocessing steps, letting the ML models learn
from the data instead of trying to build inverse models to revert
the camera post-processing is a more promising approach. Convolutional
neural networks (CNNs) have seen tremendous success in the last few
years in the computer vision field. Because the colorimetric assays
on paper devices do not provide variations in texture and shape, the
neural networks have limited to no benefit compared to other ML models
such as RF. We might be able to leverage the power of CNNs and build
more accurate analyte concentration estimation methods if we can develop
novel PADs that express shape and texture variation, depending on
the target concentration analyte.Finally, robust ML models
can be useful in analyzing large numbers
of samples in applications such as environmental monitoring and clinical
diagnosis during emergencies for assays involving colorimetric paper
devices. Appropriate ML models integrated in smartphones that can
read assay results performed on the PAD platform by taking images
process or analyze the signal to accurately predict assay results
and report or store the results locally or on cloud, which could be
powerful tools in several measurement applications. Our future work
will investigate the images from real samples that have a complex
matrix and mixture of different colors instead of single color to
understand how interferents play roles.
Experimental Section
Fabrication
of PADs
We designed a layout of circular
patterns in a computer and printed on Whatman No 1 grade filter paper
using a Xerox ColorQube 8580 solid wax printer.[26] The wax printed paper was heated from the backside by pressing
with a dry clothing iron on its surface. The backside of the PADs
was laminated to prevent the leakage of reagents to the other side
of the paper. Finally, the paper sheet was cut in such a way that
each PAD contained two circular assay regions as reference and sample
zones (Figure A).
Classification Datasets
We prepared datasets for two
different assays: the first one using the food color assay and the
second one using the pesticide assay. Each dataset used four different
smartphones (Huawei SCC-U21, iPhone 6, Honor 8C, and Samsung Galaxy
J7 Max) for image acquisition at different lighting conditions, camera
to PAD distances, and capture angles. The lighting conditions included
outdoor sunlight, indoor daylight, fluorescent light, incandescent
light, and a combination of them. A general procedure of the assay
on a paper device is given in Figure B.The food color data sets were obtained by
loading 10 different dilutions of yellow food color (Foster Clark
Product Ltd.) onto the PADs. We chose yellow dye to closely match
the color produced in pesticide experiments so that our analysis remains
consistent. We captured 2400 images in total under various conditions.
Images that were unclear or blurred were removed, and 2353 images
were used for training ML models. The images were labeled from class
1 to 10. One for the highest concentration and 10 for the lowest concentration
of the food color. The dataset contained an approximately equal number
of images per class. A new set of 600 images of the same food color
concentrations was obtained and was used as the test data set. These
images were taken on a different day—varying the illumination,
randomly changing the camera, and camera–PAD distance—to
create different test datasets from the training datasets. Representative
images are shown in Figure C.The second dataset was prepared for a more realistic
application,
which included enzyme inhibition assay for the pesticide residue measurement
based on the Ellman method.[27] This assay
is based on the inhibition of acetylcholinesterase enzyme activity
in the presence of pesticides. In this assay, the acetylcholinesterase
enzyme (Sigma-Aldrich) breaks down the acetylthiocholine chloride
(AtCh) substrate (Sigma-Aldrich) into thiocholine and acetic acid.
The thiocholine molecules react with Ellman’s reagent (dithiobisnitrobenzoic
acid; DTNB) (Sigma-Aldrich) to give a yellow product of thionitrobenzoic
acid.[27] To run the assay, we first loaded
the test and reference zones on PADs with the acetylcholinesterase
enzyme. Then, the sample containing the pesticide was added to the
test zone, while a blank solution with no pesticides was added in
the reference zone. After 5 min, an enzyme substrate acetylthiocholine
chloride was added to both zones. In the reference zone, the enzyme
remains active. In the test zone, its activity is compromised due
to the presence of organophosphate and the carbamate group of pesticides,
which inhibit the enzyme activity by binding to it. Based on the extent
of inhibition, the amount of pesticide on the sample is estimated.
Finally, Ellman’s reagent (DTNB) was added, which reacted with
the thiocholine molecules produced by the enzyme reaction to give
a characteristic yellow product of thionitrobenzoic acid. In the reference
zone, a very vibrant yellow color develops because of the retention
of full enzyme activity, whereas the color intensity is inversely
proportional to the concentration of the pesticide present in the
sample in the test zone.[1,28] The images of PADs
were captured at 10 min of the enzyme reaction using a smartphone
for further analysis (see Figure B for a general outline). Considering the short stability
of the enzyme and enzyme substrate, we used a freshly prepared enzyme
and enzyme substrate during the assay.We collected 1872 images
for training data sets by repeating the
same experiment in multiple days and different lighting conditions
using four different smartphones. Each image was categorized into
a class from 1 to 10 like food color assay images. Class 1 represents
a pesticide (Paraoxon, Sigma-Aldrich) concentration of 100 ppm. The
same pesticide solution was serially diluted half in the remaining
assays. New experiments at different lighting conditions were performed
to obtain 601 new images as test datasets. See Figure C for the representative images of the pesticide
assay.
Extraction of Pixel Values from Assay Images
The preprocessing
of a PAD image is outlined in Figure D. The leftmost image in panel 4D is a typical image
of the PAD captured using a camera. The two regions encircled by black
rings are the ROIs that contain the color information of the reference
and target samples. We developed an automatic threshold-based segmentation
algorithm to extract all the pixels lying in these two regions. RGB
images were converted to grayscale images which were then converted
to a binary image by applying a threshold T = 0.8
· Im + 30, where Im is the mean intensity of an image converted to grayscale,
and 0.8 and 30 are empirically chosen values after visual inspection
across multiple images. The binary images provide us with the masks
that were used to extract the pixel values lying in the two ROIs of
the corresponding original images, as shown in the rightmost image
of Figure D.
Evaluation
of Multiclass Classification Models
We used
average classification accuracy (ACA), which is a commonly used metric
for multiclass classification problems. ACA is a ratio of the total
number of correct predictions to the total number of predictions ACA
= . In addition, we visualized the results
using confusion matrices that provide information on how many samples
of a particular class are misclassified to another class. We used
a fivefold cross-validation where the training dataset was randomly
split into five subsets (folds), and the model was trained five times
such that each time a unique fold was selected for validation and
the remaining four for training. The mean ACA and its standard deviation
were reported for the cross-validation experiments. To evaluate the
robustness of ML models and their generalization ability, we also
evaluated the models with a separate test set under different conditions
trying to emulate actual real-life testing scenarios.We used
various approaches to extract features and feed them as input to the
ML models. We compare these approaches using the following two experimental
setups: (a) color spaces: RGB vs HSV vs LAB color representation;
(b) mean pixel value of ROIs for each channel vs using all the pixel
values of the downsampled ROIs. Similarly, we assessed the impact
of the reference test region using the third setup; and (c) using
only target sample vs using both the reference and target sample.After extracting the mean color intensity of the sample and reference
assay zones, we obtained 2 × 3 = 6 unique values for each assay
(two circular zones and three-color channels). This can be fed as
a six-dimensional feature vector to ML classification models. The
mean values from the ROI do not capture the variation of pixel values
within the ROI. However, using all the pixels of the ROIs as the input
features to train ML models dramatically increases the feature dimension,
which computationally affects the training of some ML algorithms such
as SVM. As most of the PAD colorimetry images only have color information
without texture and shape information, as a compromise, we downsampled
the cropped image into a size of 16 × 16 and converted it to
a one-dimensional (1D) vector of dimension 16 × 16 × 3 =
768 (for three-color channels) for each reference and targe ROIs,
resulting in two 1D vectors.We compared the following four
most widely used supervised ML models
for colorimetry with PADs: LR,[29] SVM,[30] RF,[31] and ANN.[32] We used Python-based ML library, Scikit-learn,[33] to build LR, SVM, and RF models. For ANN, we
adopted a popular Python-based framework, Keras.[34] The details of the implementation configuration are given
in Table .
Table 2
Implementation Details of ML Models
model type
library
or framework
implementation details of
models
logistic regression
Scikit-learn
multinominal + L2 penalty, L-BFGS
solver, maximum iteration
= 10,000
support vector machine
Scikit-learn
squared L2 penalty, linear kernel
random forest
Scikit-learn
no. of trees = 1000, split criterion = gini
artificial neural network (fully connected)
Keras
3-dense layers, sigmoid + softmax activation, adam optimizer,
and categorical cross-entropy loss
Authors: Ali Y Mutlu; Volkan Kılıç; Gizem Kocakuşak Özdemir; Abdullah Bayram; Nesrin Horzum; Mehmet E Solmaz Journal: Analyst Date: 2017-06-09 Impact factor: 4.616