Literature DB >> 27977767

Identification of Alfalfa Leaf Diseases Using Image Recognition Technology.

Feng Qin1, Dongxia Liu2, Bingda Sun3, Liu Ruan1, Zhanhong Ma1, Haiguang Wang1.   

Abstract

Common leaf spot (caused by Pseudopeziza medicaginis), rust (caused by Uromyces striatus), Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and Cercospora leaf spot (caused by Cercospora medicaginis) are the four common types of alfalfa leaf diseases. Timely and accurate diagnoses of these diseases are critical for disease management, alfalfa quality control and the healthy development of the alfalfa industry. In this study, the identification and diagnosis of the four types of alfalfa leaf diseases were investigated using pattern recognition algorithms based on image-processing technology. A sub-image with one or multiple typical lesions was obtained by artificial cutting from each acquired digital disease image. Then the sub-images were segmented using twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree, and linear discriminant analysis). After a comprehensive comparison, the segmentation method integrating the K_median clustering algorithm and linear discriminant analysis was chosen to obtain lesion images. After the lesion segmentation using this method, a total of 129 texture, color and shape features were extracted from the lesion images. Based on the features selected using three methods (ReliefF, 1R and correlation-based feature selection), disease recognition models were built using three supervised learning methods, including the random forest, support vector machine (SVM) and K-nearest neighbor methods. A comparison of the recognition results of the models was conducted. The results showed that when the ReliefF method was used for feature selection, the SVM model built with the most important 45 features (selected from a total of 129 features) was the optimal model. For this SVM model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. Semi-supervised models for disease recognition were built based on the 45 effective features that were used for building the optimal SVM model. For the optimal semi-supervised models built with three ratios of labeled to unlabeled samples in the training set, the recognition accuracies of the training set and the testing set were both approximately 80%. The results indicated that image recognition of the four alfalfa leaf diseases can be implemented with high accuracy. This study provides a feasible solution for lesion image segmentation and image recognition of alfalfa leaf disease.

Entities:  

Mesh:

Year:  2016        PMID: 27977767      PMCID: PMC5158033          DOI: 10.1371/journal.pone.0168274

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Alfalfa (Medicago sativa) is an important forage grass containing various nutrients. The occurrence of disease in alfalfa plants has an important influence on the yield and quality of alfalfa hay, affecting the healthy development of the alfalfa industry [1]. There are more than ten types of alfalfa leaf diseases [2, 3]. Some of these diseases have similar symptoms, resulting in difficulties in achieving an accurate diagnosis and identifying the disease via naked-eye observations of symptoms or microscopic observations of causal agents. The diagnosis and identification of alfalfa diseases mainly rely on the experience of farmers, agricultural experts or agricultural technicians. The complexity of the disease symptoms and the limitations of personnel experience may lead to errors in judgment. The rapid, accurate identification and diagnosis of diseases will help to reduce yield losses and quality decline of alfalfa hay, resulting from the diseases. With the rapid development of computer technology and information technology, it is possible to utilize image-processing technology to diagnose and identify alfalfa leaf diseases quickly, accurately and automatically. Image-processing technology has been applied to the recognition of many plant diseases [4-19]. The image-based recognition accuracy for plant diseases depends largely on the segmentation of the lesion images. Threshold-based image segmentation methods have been widely used in the segmentation of lesion images of diseased plants [20, 21]. However, there is usually great variance in color both between lesions of different diseases and between lesions from a disease at different stages. Therefore, it is very difficult to determine the appropriate threshold when threshold-based image segmentation methods are used to solving segmentation problems for plant disease images with complex colors. Image segmentation methods based on a fuzzy C-means clustering algorithm [22] or a K_means clustering algorithm [11, 15, 23] have been used to carry out lesion segmentation of plant disease images. Such segmentation methods must specify the number of clusters in advance. Inappropriate clustering number may lead to over-segmentation or under-segmentation of lesion images. However, a great computational cost is required to determine the appropriate number of clusters, especially for segmentation operations for high-pixel images. Supervised classification is a technique based on typical samples to deduce a functional equation for classification. Lesion segmentation of plant disease images can be effectively realized using the supervised classification method [24, 25]. However, the features of typical lesion regions and the features of typical health regions in a disease image cannot be obtained automatically and in a targeted fashion by only using a supervised classification method. Automatic segmentation of plant disease images can be effectively achieved by integrating a clustering algorithm and a supervised classification algorithm [26, 27]. There are color, texture and shape differences between lesion images from different plant diseases. Image recognition of plant diseases can be implemented using an appropriate pattern recognition algorithm based on color, texture and shape features of the lesion images [10, 11, 13, 17, 28]. Moreover, to reduce the complexity of the disease identification model and improve the model’s generalization ability, it is necessary to carry out feature selection according to the importance of features. To the best of our knowledge, systematic studies on image recognition of alfalfa diseases have not yet been reported. In this study, automatic recognition of four common alfalfa leaf diseases including alfalfa common leaf spot (caused by Pseudopeziza medicaginis), alfalfa rust (caused by Uromyces striatus), alfalfa Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and alfalfa Cercospora leaf spot (caused by Cercospora medicaginis), was investigated based on acquired digital disease images. Of twelve segmentation methods integrating with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree (CART) and linear discriminant analysis), the best image segmentation method was selected for further image processing and image recognition. After extraction of texture, color and shape features from the lesion images, feature selection was conducted using three different methods, i.e., the ReliefF method [29], the 1-rule (1R) method [30] and the correlation-based feature selection (CFS) method [31]. Based on the selected features, disease recognition models were built using three supervised learning methods including random forest, support vector machine (SVM) and K-nearest neighbor (KNN). Moreover, after the features used for building the optimal supervised model were transformed using principal component analysis (PCA), disease recognition semi-supervised models were built using a self-training algorithm based on Naive Bayes classifiers [32, 33]. After comparing the recognition results of each model, the optimal model was determined for disease image recognition. The aim of this study was to provide a solution for rapid and accurate identification of four alfalfa leaf diseases and provide some supports for the development of an automatic alfalfa leaf disease diagnosis system.

Materials and Methods

Image Acquisition

Infected alfalfa leaves with typical symptoms used in this study were sampled from the Langfang Forage Experimental Base, Institute of Animal Science, Chinese Academy of Agricultural Sciences and alfalfa fields in Xuanhua District, Zhangjiakou, Hebei Province, China. The study was conducted with the permission for the Langfang Forage Experimental Base given by Qinghua Yuan from Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China. And the study was conducted with the permission for the alfalfa fields in Xuanhua District given by Dongxia Liu from College of Agriculture and Forestry Science and Technology, Hebei North University, Zhangjiakou, Hebei Province, China. All the diseased alfalfa leaves in the fields resulted from natural infections. The infected alfalfa leaves in the early stage of diseases were not sampled. Samples were taken to the laboratory and disease types of the leaves were determined mainly by using conventional diagnostic methods including naked-eye observations of disease symptoms and microscopic observation of morphological characteristics of causal agents. Images were captured with the lesion side of each diseased leaf facing up on a white background. When taking images, the leaves were expanded as flat as possible, and the camera lens was parallel with the plane of the leaves. A total of 899 images with typical disease symptoms were acquired, including 76 images of alfalfa common leaf spot, 136 images of alfalfa rust, 231 images of alfalfa Leptosphaerulina leaf spot and 456 images of alfalfa Cercospora leaf spot. The image size was 4,256×2,832 pixels (jepg format). To reduce the workload of image analysis and focus on the regions of interest, a sub-image with one typical lesion or multiple typical lesions was obtained from each original disease image using artificial cutting. The size of a sub-image depended on the number of typical lesions and the size of each typical lesion. Using the sub-images, the image dataset of alfalfa common leaf spot comprising 76 sub-images, the image dataset of alfalfa rust comprising 136 sub-images, the image dataset of alfalfa Leptosphaerulina leaf spot comprising 231 sub-images, the image dataset of alfalfa Cercospora leaf spot comprising 456 sub-images and the aggregated image dataset comprising 899 sub-images, were constructed. These image datasets were used for segmentation of lesion images and evaluation of segmentation methods.

Lesion Image Segmentation

In this study, twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, CART and linear discriminant analysis) were used to segment the sub-images, and then their segmentation effects were evaluated. The main steps for lesion image segmentation are shown in Fig 1.
Fig 1

Work flow diagram of main steps for lesion image segmentation.

Each obtained sub-image was converted from RGB color space into HSV color space and L*a*b* color space. In each pixel in the sub-image, the a* component value and the b* component value were regarded as the color features of the pixel. All pixels in the image were clustered into ten classes using K_median clustering, fuzzy C-means clustering and K_median clustering. The three clustering algorithms were carried out using the software MATLAB R2013b (MathWorks, Natick, MA, USA). For K_median clustering, the number of repetitions was set to three, and default values were used for the other parameters. For fuzzy C-means clustering, all the parameters with the default values were used. K_median clustering was implemented using Euclidean distance while the initial clustering seed was obtained using a random selection method. The maximum number of iterations was set to 1,000, the number of repetitions was set to three, and minimizing the sum of the intraclass distances was regarded as the clustering criterion. After all pixels in a sub-image were clustered into ten classes using a clustering algorithm, the mean of the H components of all pixels in each class was calculated. Compared to healthy alfalfa leaves, the H components of the sub-images of the four alfalfa leaf diseases were smaller. Consequently, the pixels in the class with the minimum mean were treated as typical lesion pixels, and the pixels in the seven classes with the largest means were treated as typical healthy pixels. There is a transition region between the lesion region with typical symptoms and the typical healthy region, and H components are usually between the two regions. The pixels in the two remaining classes were treated as pixels that were not involved in building the pixel classification models. The typical lesion pixels and typical healthy pixels were labeled positive samples and negative samples, respectively, and these pixels constituted the training set for building pixel classification models. With a* component value and b* component value of each pixel in the training set as feature variables, pixel classification models to classify all the pixels in the sub-image were built using logistic regression analysis, Naive Bayes algorithm, CART and linear discriminant analysis, respectively. For each classification model, each pixel classified as lesion was assigned a value of 1 and each remaining pixel in the sub-image was assigned a value of 0. Thus, an initial binary lesion segmentation image was obtained. To avoid the influence of the white background, the pixels with B component values higher than 200 in the original sub-image were identified. Each pixel with B component value higher than 200 was assigned a value of 0, and each remaining pixel was assigned a value of 1 in the initial binary lesion segmentation image to achieve a binary image. A new binary image was obtained by multiplying this binary image with the initial binary lesion segmentation image. The hole filling operation was performed on this new binary image and the areas of all connected regions in the image were calculated. The connected regions with areas less than one-sixteenth of the maximum area were removed, and the final lesion segmentation image was obtained. If there were no pixels with B component values higher than 200, a hole filling operation was carried out on the initial binary lesion segmentation image. Subsequently, the areas of all connected regions in the image were calculated and the connected regions with the areas less than one-sixteenth of the maximum area were removed to give the final lesion segmentation image. In the process of lesion segmentation, all pixels in a sub-image were classified as either lesion pixels or healthy pixels. Therefore, lesion segmentation is similar to binary classification problem in the field of pattern recognition, and the evaluation of segmentation effects can be carried out using methods for evaluating a binary classification model. Manual segmentation of a sub-image using the Adobe Photoshop CC software was conducted to determine the true class of each pixel. In comparison with manual segmentation method, Recall and Precision, two commonly used indices for evaluating classification models in the field of pattern recognition [34], were used to evaluate the twelve segmentation methods integrated with clustering algorithms and supervised classification algorithms. In this study, the two indices were calculated according to the following formulas: Recall = N1/N2 and Precision = N1/N3, where N1 was the total number of lesion pixels in a sub-image correctly classified by using a segmentation method integrated with a clustering algorithm and a supervised classification algorithm, N2 was the total number of lesion pixels in the sub-image classified using the manual segmentation method, and N3 was the total number of the pixels in the sub-image. Both Recall and Precision range from 0–1. Larger values of Recall and Precision indicate a better integrated segmentation method. The index “Score” combining Recall and Precision, is proposed in this study to evaluate the twelve segmentation methods and is calculated according to the following formula: Score = (Recall+Precision)/2. The Score also ranges from 0 to 1. Larger Score values demonstrate that the corresponding integrated segmentation method is better. Based on the image datasets described above, the three indices were used to evaluate the twelve integrated segmentation methods for achieving the best method to segment sub-images for further image recognition in this study. After segmentation, in each final binary lesion segmentation image, each independent white region (i.e., connected component) was labeled a lesion, and the black background region was labeled the healthy region. The location of the smallest rectangle containing each lesion, namely, the independent white region, was determined. After multiplying each of color channels (R, G and B) of the original sub-image with the corresponding final binary lesion segmentation image, the obtained images were integrated into a new RGB image using the MATLAB system function “cat” to remove the background of the original sub-image and retain only the lesion regions. Based on the location information of each smallest rectangle containing a lesion, each rectangle was cut down from the new RGB image using the MATLAB system function “imcrop” to achieve multiple lesion images. For example, if there were two lesions in an original sub-image, two lesion images were achieved through the above operations. After segmentation using the best segmentation method based on the 899 sub-images of the four types of alfalfa leaf diseases, a total of 1,651 typical lesion images, each of which contained only one lesion, were obtained for further feature extraction, feature selection and disease image recognition. For building disease recognition models, the typical lesion images were divided into a training set and a testing set in a ratio of 2:1. The training set consisted of 1,100 lesion images including 111 lesion images of alfalfa common leaf spot, 267 lesion images of alfalfa rust, 371 lesion images of alfalfa Leptosphaerulina leaf spot and 351 lesion images of alfalfa Cercospora leaf spot. The testing set consisted of 551 lesion images including 56 lesion images of alfalfa common leaf spot, 133 lesion images of alfalfa rust, 185 lesion images of alfalfa Leptosphaerulina leaf spot and 177 lesion images of alfalfa Cercospora leaf spot.

Feature Extraction and Normalization

A total of 129 texture, color and shape features were extracted from the 1,651 typical lesion images of the four alfalfa leaf diseases. The 90 extracted texture features included the seven Hu invariant moments (63 features), contrast (nine features), energy (nine features) and homogeneity (nine features) of the gray images of the nine components in RGB color space, HSV color space and L*a*b* color space. There were 30 color features including the first moments (nine features), the second moments (nine features) and the third moments (nine features) of the gray images of the nine components in RGB, HSV and L*a*b* color spaces, and three color ratios (r, g and b) of R, G and B components. Of the nine extracted shape features, circularity of disease lesion, complexity of disease lesion and the seven Hu invariant moments of the binary lesion image were included. Hu invariant moments used to depict the texture features of an image are invariant to translation, rotation and scaling. Contrast is applied to measure the gray level of a pixel in comparison with the neighbor pixels in an image, energy is a measure of the consistency of an image, and homogeneity is used to measure the spatial closeness of elements with the diagonal distribution in a co-occurrence matrix [35]. Circularity denotes the degree that a lesion region is circular, and a bigger value indicates that the lesion region is more circular [11]. Complexity refers to the complexity and discrete degree of a lesion region, and a bigger value indicates the lesion region with higher complexity and greater discrete degree [11]. The seven Hu invariant moments were calculated using the calculation formulas as described in [36]. The other extracted features were calculated according to the formulas shown in Table 1.
Table 1

Extracted image features (excluding Hu invariant moments) and calculation formulas.

Feature parameterCalculation formulaReference
Contrasti=1Mj=1M(ij)2pij, where M×M denotes the size of a co-occurrence matrix, M = 1, 2, …, and pij denotes the quotient of the element (i, j) of a co-occurrence matrix divided by the sum of the elements of the co-occurrence matrix.[35]
Energyi=1Mj=1Mpij2, where M×M denotes the size of a co-occurrence matrix, M = 1, 2, …, and pij denotes the quotient of the element (i, j) of a co-occurrence matrix divided by the sum of the elements of the co-occurrence matrix.[35]
Homogeneityi=1Mj=1Mpij1+|ij|, where M×M denotes the size of a co-occurrence matrix, M = 1, 2, …, and pij denotes the quotient of the element (i, j) of a co-occurrence matrix divided by the sum of the elements of the co-occurrence matrix.[35]
First moment1Li=0L1fip(fi), where μ1, μ2 and μ3 refer to the first moment, second moment and third moment, respectively, fi represents a random variable of gray level, p(fi) represents the gray level histogram of an image region, i = 0, 1, 2, …, L-1, and L is the number of different gray levels.[37]
Second moment[1Li=0L1(fiμ1)2p(fi)]12, where μ1, μ2 and μ3 refer to the first moment, second moment and third moment, respectively, fi represents a random variable of gray level, p(fi) represents the gray level histogram of an image region, i = 0, 1, 2, …, L-1, and L is the number of different gray levels.[37]
Third moment[1Li=0L1(fiμ1)3p(fi)]13, where μ1, μ2 and μ3 refer to the first moment, second moment and third moment, respectively, fi represents a random variable of gray level, p(fi) represents the gray level histogram of an image region, i = 0, 1, 2, …, L-1, and L is the number of different gray levels.[37]
Color ratio rRR+G+B[11]
Color ratio gGR+G+B[11]
Color ratio bBR+G+B[11]
Circularity4πSL2, where S and L represent the area and perimeter of a lesion region, respectively.[11]
ComplexityL2S, where S and L represent the area and perimeter of a lesion region, respectively.[11]
Because of the great differences between the ranges of extracted features, which may impact the accuracies of disease recognition models, the values of each extracted feature were normalized to the range of 0–1 using the following formula: , where was the value of the ith feature after normalization and X, and were the value of the ith feature, the minimum value and the maximum value of the feature before normalization, respectively.

Feature Selection

To reduce the complexity of image recognition resulting from excessive features and improve the accuracy and applicability of image recognition methods, the extracted features were screened after feature normalization. Based on the training set including 1,100 lesion images described above, feature selection was conducted using the ReliefF method, the 1R method and the CFS method. For the ReliefF method, a high weight was assigned to a feature that has a high correlation with categories, and a feature with a higher weight indicates that this feature is more important. For the 1R method, the classification accuracy is calculated with each feature as the input of the 1R classifier successively and is used to evaluate the importance of the feature. Higher classification accuracy indicates that the corresponding feature is more important. The CFS method is unlike the ReliefF method and the 1R method, and is aimed to obtain the optimal feature subset. The correlation between the optimal feature subset and dependent variable should be as high as possible. Meanwhile, the correlations among the features in the optimal feature subset should be as small as possible. In this study, the three methods for feature selection, including the ReliefF method, the 1R method and the CFS method, were all implemented using the open source software Weka (Waikato Environment for Knowledge Analysis) 3.7, developed by The University of Waikato in Hamilton, New Zealand. The default values were used for all the parameters involved in the methods. The importance ranking of each feature for classification and recognition could be obtained using the ReliefF method and the 1R method, respectively. A higher ranking for a feature indicates that it is more likely to yield better recognition results if used to build the recognition model. To find the best combination of features, according to the importance ranking of each feature for classification and recognition, the top N (N = 1, 2, 3, …, 129) features were treated as inputs for the disease recognition models based on random forest, SVM and KNN. According to the recognition accuracies of the training set and the testing set, the best top N features were selected as the best feature combination to build the disease recognition models. For the CFS method, the best feature combination, namely, the optimal feature subset, was obtained directly for modeling.

Building of Disease Recognition Models

After the segmentation, feature extraction, feature normalization and feature selection described above, disease recognition models were built based on the 1,651 typical lesion images of the four alfalfa leaf diseases using three supervised learning methods including random forest, SVM and KNN. All models were built using the MATLAB R2013b software. The recognition accuracies of both the training set and the testing set were calculated and used to evaluate the disease recognition models. Random forest is a combination model composed of a number of fully grown decision trees [38]. Each decision tree produces a predictive value, and the final prediction result of the model can be determined by voting. To a certain extent, the classification effects of a random forest depend on the number of decision trees that constitute the model. Consequently, it is necessary to determine the optimal number of decision trees by testing a variety of values based on the classification results of the random forests. To build a disease recognition model based on the random forest method, the number of decision trees was assigned as 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100, and the optimal number of decision trees was determined according to the recognition results of the model. The number of features randomly selected by each decision tree was set as the arithmetic square root of the total number of features. If the arithmetic square root was a decimal, the value obtained by rounding up the decimal was treated as the number of features randomly selected by each decision tree. SVM can be well applied to high-dimensional data [39, 40]. It has been widely used in image recognition of plant disease [11, 20, 28, 41]. In this study, SVM models for disease image recognition were built with a radial basis function as the kernel function using C-SVM in the LIBSVM package developed by Chih-Jen Lin Group from Taiwan, China [42]. For each SVM model, both the optimal penalty parameter C and the optimal kernel function parameter g were searched using the grid search algorithm in the range of 2−10–210 with a searching step of 0.4. The recognition accuracies were calculated at all points within the grid by running three complete cross validations based on the training set. The values of C and g were selected as the optimal parameters as the recognition accuracy was the highest, and were recorded as Cbest and gbest, respectively. The KNN algorithm treats each sample as a point in a multidimensional space, and a point in the testing set is assigned to a class that most of the K points nearest to that point in the training set belong to [43, 44]. The distance of that point to each of the K points is commonly measured by Euclidean distance. An appropriate value of K is the key to high classification accuracy using the KNN algorithm. In this study, to build the KNN models for image recognition using Euclidean distance, the K values were set as 5, 9 and 13, respectively, and the optimal value of K was determined according to the recognition results of the models. For the supervised learning methods, the true class that each sample in the training set belongs to is known. In other words, all samples in the training set are labeled samples. In some cases, the cost of obtaining training samples is low, but the cost of determining the true class of the training samples is very high, which requires a large amount of manpower and material resources. When a small number of samples in the training set are labeled, a recognition model can be built using a semi-supervised learning method. In practice, when many disease images are acquired with lower costs, the experts in the corresponding field just need to make artificial recognition and classification of a small number of disease images. Disease recognition models can be built using semi-supervised learning methods, which will greatly reduce the costs of building a plant disease automatic recognition system. In this study, the features used to build the optimal supervised model were transformed using PCA and the disease recognition semi-supervised models were built using a self-training algorithm based on Naive Bayes classifiers [32, 33]. In this method, an initial classifier is built based on the given labeled samples and used to predict the unlabeled samples in the training set. The prediction labels with high confidence in the classifier and their corresponding samples are added to a dataset comprising the labeled samples from the training set. Subsequently, based on the new dataset comprising the labeled samples, a new classifier is built. The above process continues until a certain criterion is reached. The criterion may be the number of iterations reaching the maximum number of iterations or the number of labeled samples reaching the set ratio, etc. In this study, based on the same training set and testing set as used for building the supervised models described above, disease recognition semi-supervised models were built with ratios of the labeled and unlabeled samples in the training set equal to 2:1, 1:1 and 1:2. The first n principal components were successively used to build the disease recognition semi-supervised models, and the corresponding recognition accuracies of the training set and the testing set were obtained. According to the accuracies, the recognition effects of the models were evaluated, and the optimal number of principal components was determined. The above disease recognition semi-supervised models were built using the R3.1.2 software and the function “SelfTrain” in the package “DMwR” as the default values were used for the model parameters.

Results

Image Segmentation Results

Based on the image datasets described above, the comparison results of the twelve segmentation methods integrated with the clustering algorithms and the supervised classification algorithms are shown in Table 2.
Table 2

Performance evaluations of the twelve segmentation methods based on the sub-images of four alfalfa leaf diseases.

Image datasetClustering methodSupervised classification methodRecallPrecisionScore
MeanMedianMeanMedianMeanMedian
Image dataset of alfalfa common leaf spotK_means clustering algorithmLogistic regression analysis0.64430.67270.89400.91270.76910.7925
Naive Bayes algorithm0.63180.63790.89220.90470.76200.7644
CART0.55470.56830.87160.88290.71320.7300
Linear discriminant analysis0.76940.79810.92350.93920.84650.8743
Fuzzy C-means clustering algorithmLogistic regression analysis0.61170.64550.88590.91190.74880.7783
Naive Bayes algorithm0.57250.58940.87850.88760.72550.7484
CART0.48560.46840.85700.86860.67130.6639
Linear discriminant analysis0.73890.80360.91550.93570.82720.8736
K_median clustering algorithmLogistic regression analysis0.69890.72610.90100.92720.79990.8291
Naive Bayes algorithm0.68500.67850.89820.92340.79160.7949
CART0.61530.59260.88060.90010.74790.7435
Linear discriminant analysis0.79050.81990.92200.93660.85620.8810
Image dataset of alfalfa rustK_means clustering algorithmLogistic regression analysis0.75080.77140.94590.95680.84840.8626
Naive Bayes algorithm0.72000.73540.93960.95170.82980.8444
CART0.70210.73700.93720.95180.81970.8413
Linear discriminant analysis0.83030.83760.95830.96390.89430.9013
Fuzzy C-means clustering algorithmLogistic regression analysis0.67410.67720.93380.94230.80390.8091
Naive Bayes algorithm0.63660.64240.92660.93860.78160.7872
CART0.59980.61560.91970.93140.75980.7721
Linear discriminant analysis0.80510.81160.95490.96090.88000.8870
K_median clustering algorithmLogistic regression analysis0.81660.83840.95420.96440.88540.9025
Naive Bayes algorithm0.80190.82880.94580.95950.87380.8968
CART0.79150.83410.94750.96400.86950.9019
Linear discriminant analysis0.85160.85960.96060.96710.90610.9137
Image dataset of alfalfa Leptosphaerulina leaf spotK_means clustering algorithmLogistic regression analysis0.83290.87360.96340.97220.89820.9248
Naive Bayes algorithm0.85610.89080.96350.97130.90980.9335
CART0.76650.79330.95710.96970.86180.8803
Linear discriminant analysis0.90020.92850.96570.97160.93300.9510
Fuzzy C-means clustering algorithmLogistic regression analysis0.81020.83340.96220.97330.88620.9040
Naive Bayes algorithm0.81810.84070.96230.97180.89020.9078
CART0.71880.74580.95360.96740.83620.8569
Linear discriminant analysis0.89000.91700.96520.97230.92760.9441
K_median clustering algorithmLogistic regression analysis0.89190.92550.96130.96910.92660.9481
Naive Bayes algorithm0.90910.94800.95830.96260.93370.9565
CART0.86290.91560.95560.96560.90930.9409
Linear discriminant analysis0.92870.94950.96360.96900.94620.9583
Image dataset of alfalfa Cercospora leaf spotK_means clustering algorithmLogistic regression analysis0.58510.60440.82500.84710.70510.7172
Naive Bayes algorithm0.52960.53940.80440.81730.66700.6823
CART0.42400.41840.76270.76720.59330.5908
Linear discriminant analysis0.76560.78240.89320.90410.82940.8431
Fuzzy C-means clustering algorithmLogistic regression analysis0.58080.59540.82310.84010.70190.7184
Naive Bayes algorithm0.50890.51840.79680.81220.65290.6680
CART0.41840.41200.76200.76670.59020.5878
Linear discriminant analysis0.75080.76950.88770.90270.81920.8330
K_median clustering algorithmLogistic regression analysis0.62370.63300.83620.86120.73000.7488
Naive Bayes algorithm0.57050.58850.81710.83730.69380.7149
CART0.48760.47020.78240.79850.63500.6292
Linear discriminant analysis0.77860.79380.89510.91090.83690.8496
Aggregated image datasetK_means clustering algorithmLogistic regression analysis0.67890.70400.88460.91160.78180.8057
Naive Bayes algorithm0.65100.64740.87300.89600.76200.7690
CART0.56500.57280.84810.87910.70650.7138
Linear discriminant analysis0.81050.83170.92420.94190.86740.8870
Fuzzy C-means clustering algorithmLogistic regression analysis0.65670.66330.88080.90440.76870.7814
Naive Bayes algorithm0.61320.60410.86580.88300.73950.7412
CART0.52880.51800.84300.86790.68590.6922
Linear discriminant analysis0.79400.81840.92010.93880.85710.8777
K_median clustering algorithmLogistic regression analysis0.72820.76480.89160.92700.80990.8468
Naive Bayes algorithm0.70220.70420.87950.91190.79080.8028
CART0.64070.67140.86000.90350.75040.7868
Linear discriminant analysis0.82940.85140.92490.94240.87710.8997

Note: The aggregated image dataset was obtained after aggregation of four image datasets of alfalfa common leaf spot, alfalfa rust, alfalfa Leptosphaerulina leaf spot and alfalfa Cercospora leaf spot.

Note: The aggregated image dataset was obtained after aggregation of four image datasets of alfalfa common leaf spot, alfalfa rust, alfalfa Leptosphaerulina leaf spot and alfalfa Cercospora leaf spot. For the image dataset of alfalfa common leaf spot, when the segmentation method integrated with K_median clustering algorithm and linear discriminant analysis was used, the highest Scores with a mean of 0.8562 and the median of 0.8810 were obtained, and the highest Recalls with a mean of 0.7905 and the median of 0.8199 were also obtained. When the segmentation method integrating with K_means clustering algorithm and linear discriminant analysis was used based on the image dataset, the highest Precisions with a mean of 0.9235 and the median of 0.9392 were obtained. For the image dataset of alfalfa rust, when the segmentation method integrated with K_median clustering algorithm and linear discriminant analysis was used, the highest values of Scores, Recalls and Precisions were obtained. The results showed that the mean of Scores was 0.9061 and the median of Scores was 0.9137, that the mean of Recalls was 0.8516 and the median of Recalls was 0.8596 and that the mean of Precisions was 0.9606 and the median of Precisions was 0.9671. For the image dataset of alfalfa Leptosphaerulina leaf spot, when the segmentation method integrated with K_median clustering algorithm and linear discriminant analysis was used, the highest values of Scores and Recalls were obtained. The results showed that the mean of Scores was 0.9462 and the median of Scores was 0.9583 and that the mean of Recalls was 0.9287 and the mean of Recalls was 0.9495. For this image dataset, when the segmentation method integrated with K_means clustering algorithm and linear discriminant analysis was used, the highest mean of Precisions was obtained and its value was 0.9657. For this image dataset, when the segmentation method integrating with fuzzy C-means clustering algorithm and logistic regression analysis was used, the highest median of Precisions was obtained, and its value was 0.9733. For the image dataset of alfalfa Cercospora leaf spot, when the segmentation method integrated with K_median clustering algorithm and linear discriminant analysis was used, the highest values of Scores, Recalls and Precisions were obtained. The mean and the median of Scores were 0.8369 and 0.8496, respectively. The mean and the median of Recalls were 0.7786 and 0.7938, respectively, and the mean and the median of Precisions were 0.8951and 0.9109, respectively. For the aggregated image dataset comprising 899 sub-images of the four alfalfa leaf diseases, when the segmentation method integrated with K_median clustering algorithm and linear discriminant analysis was used, the highest values of Scores, Recalls and Precisions were obtained. The results showed that the mean and the median of Scores were 0.8771 and 0.8997, respectively, that the mean and the median of Recalls were 0.8294 and 0.8514, respectively and that the mean and the median of Precisions were 0.9249 and 0.9424, respectively. In summary, when the segmentation method integrated with K_median clustering algorithm and linear discriminant analysis was used, the segmentation effects for the sub-images of the four alfalfa leaf diseases were best. The segmentation results of the sub-images of the four alfalfa leaf diseases using the segmentation method integrated with K_ median clustering algorithm and linear discriminant analysis are shown in Fig 2. Using this segmentation method, all lesions in the original sub-images were effectively segmented. The results indicated that this segmentation method could effectively implement the automatic segmentation of sub-images of the four alfalfa leaf diseases. Therefore, lesion segmentation was implemented using the segmentation method integrated with K_median clustering algorithm and linear discriminant analysis for further feature extraction, feature normalization, feature selection and building of disease recognition models in this study.
Fig 2

Results of automatic segmentation of sub-images of four alfalfa leaf diseases using the segmentation method integrated with K_ median clustering algorithm and linear discriminant analysis.

A: Sub-image of alfalfa common leaf spot. B: Image after segmentation of alfalfa common leaf spot. C: Sub-image of alfalfa rust. D: Image after segmentation of alfalfa rust. E: Sub-image of alfalfa Leptosphaerulina leaf spot. F: Image after segmentation of alfalfa Leptosphaerulina leaf spot. G: Sub-image of alfalfa Cercospora leaf spot. H: Image after segmentation of alfalfa Cercospora leaf spot.

Results of automatic segmentation of sub-images of four alfalfa leaf diseases using the segmentation method integrated with K_ median clustering algorithm and linear discriminant analysis.

A: Sub-image of alfalfa common leaf spot. B: Image after segmentation of alfalfa common leaf spot. C: Sub-image of alfalfa rust. D: Image after segmentation of alfalfa rust. E: Sub-image of alfalfa Leptosphaerulina leaf spot. F: Image after segmentation of alfalfa Leptosphaerulina leaf spot. G: Sub-image of alfalfa Cercospora leaf spot. H: Image after segmentation of alfalfa Cercospora leaf spot.

Feature Selection Results Using the Methods of ReliefF, 1R and CFS

For convenience, each extracted feature was given a name, and the names of the 129 image features extracted are listed in Table 3. φLab_L1 denoted the first Hu invariant moment of the gray image of the L* component in L*a*b* color space, φshape1 denoted the first Hu invariant moment of the binary lesion image, the first moment RGB_R denoted the first moment of the gray image of the R component in RGB color space, Color ratio RGB_R denoted the color ratio r of the R component in RGB color space, Contrast RGB_R denoted the contrast of the gray image of the R component in RGB color space, Energy RGB_R denoted the energy of the gray image of the R component in RGB color space and Homogeneity RGB_R denoted the homogeneity of the gray image of the R component in RGB color space. The remaining feature names can be deduced by analogy.
Table 3

Names of image features extracted and results of feature selection using the ReliefF method, the 1R method and the CFS method.

Feature nameFeature ranking based on the ReliefF methodFeature ranking based on the 1R methodFeature nameFeature ranking based on the ReliefF methodFeature ranking based on the 1R methodFeature nameFeature ranking based on the ReliefF methodFeature ranking based on the 1R method
φLab_L16181φHSV_H27130Second moment RGB_B4127
φLab_L277103φHSV_H310140Second moment HSV_H*3255
φLab_L3102106φHSV_H48543Second moment HSV_S3867
φLab_L487128φHSV_H510624Second moment HSV_V50114
φLab_L595113φHSV_H68828Second moment Lab_L2818
φLab_L694112φHSV_H712949Second moment Lab_b149
φLab_L7126126φHSV_S1*7538Second moment Lab_a4183
φLab_a1*451φHSV_S210041Third moment RGB_R4948
φLab_a27031φHSV_S311072Third moment RGB_G5453
φLab_a38477φHSV_S411175Third moment RGB_B42118
φLab_a47958φHSV_S511365Third moment HSV_H4476
φLab_a5115123φHSV_S611250Third moment HSV_S*3669
φLab_a612098φHSV_S711888Third momentHSV_V53110
φLab_a7116124φHSV_V16374The third moment Lab_L3019
φLab_b15664φHSV_V27485Third moment Lab_b1311
φLab_b26846φHSV_V396102Third moment Lab_a3979
φLab_b310542φHSV_V483109Contrast RGB_R5966
φLab_b411961φHSV_V598104Energy RGB_R3571
φLab_b5117121φHSV_V69086Homogeneity RGB_R1825
φLab_b612863φHSV_V7124111Contrast RGB_G64100
φLab_b7114122Circularity*25Energy RGB_G5573
φRGB_R16268Complexity*694Homogeneity RGB_G2239
φRGB_R27380φshape1*6637Contrast RGB_B2152
φRGB_R39799φshape27244Energy RGB_B*134
φRGB_R48291φshape38059Homogeneity RGB_B1647
φRGB_R599107φshape48132Contrast HSV_H476
φRGB_R68994φshape510797Energy HSV_H627
φRGB_R7125101φshape69345Homogeneity HSV_H313
φRGB_G15190φshape7127108Contrast HSV_S3316
φRGB_G27662First moment RGB_R2517Energy HSV_S*5254
φRGB_G3104119First moment RGB_G*2729Homogeneity HSV_S*98
φRGB_G486120First moment RGB_B*2436Contrast HSV_V6070
φRGB_G59293Color ratio RGB_R*314Energy HSV_V3784
φRGB_G69195Color ratio RGB_G*1012Homogeneity HSV_V1923
φRGB_G7123125Color ratio RGB_B2392Contrast Lab_L6787
φRGB_B14878First moment HSV_H*833Energy Lab_L4696
φRGB_B27860First moment HSV_S751Homogeneity Lab_L*2022
φRGB_B3109116First moment HSV_V*2621Contrast Lab_a*582
φRGB_B4103117First moment Lab_L2920Energy Lab_a526
φRGB_B5121105First moment Lab_b*1510Homogeneity Lab_a4315
φRGB_B610889First moment Lab_a4082Contrast Lab_b657
φRGB_B7122129Second moment RGB_R34115Energy Lab_b1135
φHSV_H1*1213Second moment RGB_G*1757Homogeneity Lab_b5756

Note:

* Features marked with an asterisk in the table were selected based on the CFS method.

Note: * Features marked with an asterisk in the table were selected based on the CFS method. The results of feature selection using the ReliefF method, 1R method and CFS method are shown in Table 3. The selection results of both the ReliefF method and the 1R method were the importance ranking of each feature for disease recognition. As shown in Table 3, there were great differences between the importance rankings of features obtained using the ReliefF method and the 1R method. The top 10 features with the highest recognition importance selected using the ReliefF method successively were Energy RGB_B, Circularity, Color ratio RGB_R, second moment RGB_B, Energy Lab_a, Energy HSV_H, first moment HSV_S, first moment HSV_H, Homogeneity HSV_S and Color ratio RGB_G, which included four texture features, one shape feature and five color features. The top 10 features with the highest recognition importance selected using the 1R method in sequence were φLab_a1, Contrast Lab_a, Homogeneity HSV_H, Complexity, Circularity, Contrast HSV_H, Contrast Lab_b, Homogeneity HSV_S, second moment Lab_b and first moment Lab_b, which included six texture features, two shape features and two color features. Only two features, Circularity and Homogeneity HSV_S, were simultaneously selected in the top 10 features with the highest recognition importance using the ReliefF method and the 1R method. The best feature combination (i.e., the optimal feature subset) obtained using the CFS method consisted of 21 features including φLab_a1, φHSV_H1, φHSV_S1, Circularity, Complexity, φshape1, first moment RGB_G, first moment RGB_B, Color ratio RGB_R, Color ratio RGB_G, first moment HSV_H, first moment HSV_V, first moment Lab_b, second moment RGB_G, second moment HSV_H, third moment HSV_S, Energy RGB_B, Energy HSV_S, Homogeneity HSV_S, Homogeneity Lab_L and Contrast Lab_a.

Built Disease Recognition Models and Comparison of Recognition Results

Recognition Results of Disease Recognition Models Based on Random Forest

The recognition results of the random forest models based on the selected features using the ReliefF method, the 1R method or the CFS method are shown in Table 4. The results showed that when the ReliefF method was used to select the features, with the increase of the number of decision trees, the recognition accuracies of the training set and the testing set for the built random forest models fluctuated by 0%-2.18%, and the number of applied features changed in a range of 52–74. The optimal random forest model was built with the number of decision trees equal to 70 based on the top 62 features in the importance ranking for recognition, and this model was recorded as Model 1. For Model 1, the recognition accuracy of the training set was 100% and the recognition accuracy of the testing set was 92.74%. When the 1R method was used for feature selection, with an increase in the number of decision trees, the recognition accuracies of the training set and the testing set for the built random forest models fluctuated by 0%-2.00%, and the number of applied features changed in a range of 76–129. The optimal random forest model was built with the number of decision trees equal to 60 based on the top 128 features in the importance ranking and was recorded as Model 2. For Model 2, the recognition accuracy of the training set was 100% and the recognition accuracy of the testing set was 91.29%. When the CFS method was applied to feature selection, with the increase of the number of decision trees, the recognition accuracies of the training set and the testing set for the built random forest models fluctuated by 0%-2.18%. The optimal random forest model was built with the number of decision trees equal to 60 based on the 21 selected features, and this model was recorded as Model 3. For Model 3, the recognition accuracy of the training set was 100% and the recognition accuracy of the testing set was 90.20%. As shown in Table 4, with increasing number of decision trees, the recognition accuracies of the training set and the testing set for the built random forest models fluctuated within a small range, indicating that the number of decision trees had little influence on the recognition results of the random forest models in this study. Considering the recognition accuracies of the training set and the testing set and the number of applied features for modeling, the optimality ranking of the three optimal models was Model 1, Model 3, and Model 2.
Table 4

Recognition results for four alfalfa leaf diseases using random forest models based on selected features using the ReliefF method, the 1R method and the CFS method.

Number of decision treesReliefF method1R methodCFS method
Recognition accuracy of training set (%)Recognition accuracy of testing set (%)Number of applied featuresRecognition accuracy of training set (%)Recognition accuracy of testing set (%)Number of applied featuresRecognition accuracy of training set (%)Recognition accuracy of testing set (%)Number of applied features
1099.8290.567499.7389.299099.6489.2921
2099.9191.475799.9190.208899.9188.5721
3010092.385299.9190.5612999.9188.7521
4010092.566110090.5612610089.4721
5010092.385999.9190.747610088.0221
6010092.206510091.2912810090.2021
7010092.746210090.7411910089.1121
8010092.565710090.5610510088.3821
9010092.385510090.9310710088.5721
10010092.205410090.9311410089.1121

Note: For each number of decision trees, only the best random forest model for the recognition of the four alfalfa leaf diseases is shown when the features were selected using the ReliefF method or 1R method.

Note: For each number of decision trees, only the best random forest model for the recognition of the four alfalfa leaf diseases is shown when the features were selected using the ReliefF method or 1R method.

Recognition Results of Disease Recognition Models Based on SVM

The recognition results of the SVM models based on the selected features busing the ReliefF method, the 1R method and the CFS method are shown in Table 5. The results showed that when the ReliefF method was used to select the features, the optimal SVM model was built based on the top 45 features in the importance ranking for recognition, and this model was recorded as Model 4 with the optimal parameters Cbest and gbest of 6.964 and 0.435. For Model 4, the recognition accuracy of the training set was 97.64% and the recognition accuracy of the testing set was 94.74%. When the 1R method was used to conduct feature selection, the optimal SVM model was built based on the top 122 features in the importance ranking for recognition, and this model was recorded as Model 5, with Cbest equal to 36.758 and gbest equal to 0.144. For Model 5, the recognition accuracy of the training set was 97.91% and the recognition accuracy of the testing set was 94.37%. When the CFS method was used for feature selection, the SVM model built based on the 21 selected features was recorded as Model 6, with Cbest equal to 21.112 and gbest equal to 0.758. For Model 6, the recognition accuracy of the training set was 95.18% and the recognition accuracy of the testing set was 91.83%. Considering the recognition accuracies of the training set and the testing set and the number of applied features for modeling, the optimality ranking of the three models shown in Table 5 was Model 4, Model 6, and Model 5.
Table 5

Recognition results for four alfalfa leaf diseases using SVM models based on selected features using the ReliefF method, the 1R method and the CFS method.

ModelFeature selection methodCbestgbestRecognition accuracy of training set (%)Recognition accuracy of testing set (%)Number of applied features
Model 4The ReliefF method6.9640.43597.6494.7445
Model 5The 1R method36.7580.14497.9194.37122
Model 6The CFS method21.1120.75895.1891.8321

Note: Only the best SVM model for the image recognition of the four alfalfa leaf diseases is shown when the features were selected using the ReliefF method or 1R method.

Note: Only the best SVM model for the image recognition of the four alfalfa leaf diseases is shown when the features were selected using the ReliefF method or 1R method.

Recognition Results of Disease Recognition Models Based on KNN

The recognition results of the KNN models based on the selected features using the ReliefF method, the 1R method and the CFS method are shown in Table 6. The results showed that when the ReliefF method was used to select features, with the increase in the value of K, the recognition accuracies of the training set and the testing set for the built KNN models also fluctuated by 0%-3.55%. The optimal KNN model was built with a K value of 5 based on the top 68 features in the importance ranking for recognition. This model was recorded as Model 7. For Model 7, the recognition accuracy of the training set was 93.55% and the recognition accuracy of the testing set was 90.38%. When the 1R method was used to select features, with increasing K value, the recognition accuracies of the training set and the testing set for the built KNN models fluctuated by 0.18%-3.72%. The optimal KNN model was built with a K value of 5 based on the top 71 features in the importance ranking for recognition, and this model was recorded as Model 8. For Model 8, the recognition accuracy of the training set was 92.36% and the recognition accuracy of the testing set was 88.93%. When the CFS method was used to select features, with increasing K value, the recognition accuracies of the training set and the testing set for the built KNN models fluctuated by 0.18%-2.09%. Based on the 21 selected features, the optimal KNN model was built with a K value of 5, and this model was recorded as Model 9. For Model 9, the recognition accuracy of the training set was 92.27% and the recognition accuracy of the testing set was 87.30%. With increasing K value, the recognition accuracies of the training set and the testing set for the built KNN models shown in Table 6 decreased in small-scale amplitude, indicating that the best K value in this study was 5. Considering the recognition accuracies of the training set and the testing set and the number of applied features for modeling, the optimality ranking of the three models shown in Table 6 was Model 7, Model 9, and Model 8.
Table 6

Recognition results for four alfalfa leaf diseases using KNN models based on selected features using the ReliefF method, the 1R method and the CFS method.

KReliefF method1R methodCFS method
Recognition accuracy of training set (%)Recognition accuracy of testing set (%)Number of applied featuresRecognition accuracy of training set (%)Recognition accuracy of testing set (%)Number of applied featuresRecognition accuracy of training set (%)Recognition accuracy of testing set (%)Number of applied features
593.5590.386892.3688.937192.2787.3021
991.2789.663990.0088.757190.6487.1121
1390.0089.663888.6488.208490.1886.9321

Note: Only the best KNN model for the image recognition of the four alfalfa leaf diseases is shown when the features were selected using the ReliefF method or 1R method.

Note: Only the best KNN model for the image recognition of the four alfalfa leaf diseases is shown when the features were selected using the ReliefF method or 1R method.

Recognition Results of Disease Recognition Models Based on Semi-supervised Learning

Considering the recognition accuracies of the training set and the testing set and the number of applied features for modeling, Model 4 was regarded as the optimal model among the nine models described above. The recognition results of each type of alfalfa leaf disease using the optimal model are shown in Table 7. To eliminate the linear correlation between the features, the 45 features used for building Model 4 were transformed using PCA, and the changes in cumulative contribution rates with increasing number of principal components were achieved as shown in Fig 3. The results showed that the cumulative contribution rate of the first eight principal components reached 90.77% and that the cumulative contribution rate of the first 12 principal components reached 95.54%.
Table 7

Recognition results of each alfalfa leaf disease using the optimal model (Model 4).

Individual diseaseRecognition accuracy of training set (%)Recognition accuracy of testing set (%)
Alfalfa common leaf spot89.1975.00
Alfalfa rust99.6396.24
Alfalfa Leptosphaerulina leaf spot97.3096.76
Alfalfa Cercospora leaf spot99.1597.74
Fig 3

Changes in cumulative contribution rates with increasing number of principal components based on 45 features used for building Model 4.

Based on the same training set and testing set as used for building the supervised models described above, the disease recognition semi-supervised models were built using a ratio of labeled to unlabeled samples in the training set equal to 2:1. The corresponding recognition accuracies of the training set and the testing set were obtained using the first n principal components as the inputs. The changes in recognition accuracies of the training set and testing set are shown in Fig 4 with an increased number of principal components. The results showed that for disease recognition semi-supervised models with a varying number of principal components, there were no obvious differences between the recognition accuracies of the training set and the recognition accuracies of the testing set. Moreover, both the recognition accuracy of the training set and the recognition accuracy of the testing set first increased and then decreased with increasing n. Similarly, the disease recognition semi-supervised models were built with ratios of labeled and unlabeled samples in the training set equal to 1:1 and 1:2. The first n principal components were used as the inputs, and the corresponding recognition accuracies of the training set and the testing set, as shown in Figs 5 and 6, were obtained. The results showed that the recognition accuracies of the training set and the testing set obtained using the semi-supervised models with the different ratios of labeled and unlabeled samples presented similar change tendencies.
Fig 4

Recognition results for four alfalfa leaf diseases using semi-supervised models at a ratio of labeled to unlabeled samples of 2:1.

Fig 5

Recognition results for four alfalfa leaf diseases using semi-supervised models at a ratio of labeled to unlabeled samples of 1:1.

Fig 6

Recognition results for four alfalfa leaf diseases using semi-supervised models at a ratio of labeled to unlabeled samples of 1:2.

The recognition results for the four alfalfa leaf diseases using optimal semi-supervised models with the different ratios of labeled and unlabeled samples are as shown in Table 8. The results showed that when the ratio of labeled to unlabeled samples was 2:1, the optimal semi-supervised model for disease recognition was built with the first nine principal components and was recorded as Model 10. For Model 10, the recognition accuracy of the training set was 82.82% and the recognition accuracy of the testing set was 82.76%. When the ratio of the labeled samples to the unlabeled samples was 1:1, the optimal semi-supervised model for disease recognition was built with the first ten principal components and was recorded as Model 11. For Model 11, the recognition accuracy of the training set was 80.36% and the recognition accuracy of the testing set was 80.58%. When the ratio of the labeled samples to the unlabeled samples was 1:2, the optimal semi-supervised model for disease recognition was built with the first ten principal components and was recorded as Model 12. For Model 12, the recognition accuracy of the training set was 79.18% and the recognition accuracy of the testing set was 80.58%. For Model 10, Model 11 and Model 12, the recognition accuracies of the training set and the testing set were all approximately 80%, indicating that the ratio of the labeled samples to the unlabeled samples in the training set had relatively small effects on the recognition results of the disease recognition semi-supervised models when the models were built with the three ratios.
Table 8

Recognition results of four alfalfa leaf diseases using optimal semi-supervised models with various ratios of labeled to unlabeled samples.

ModelThe ratio of labeled samples to unlabeled samplesThe number of Principal componentsCumulative contribution rate (%)Recognition accuracy of training set (%)Recognition accuracy of testing set (%)
Model 102:1992.2282.8282.76
Model 111:11093.4580.3680.58
Model 121:21093.4579.1880.58

Discussion

In this study, lesion image segmentation was conducted using the segmentation methods integrated with clustering algorithms and supervised classification algorithms. Compared to image segmentation methods using only clustering algorithms, there was no need to calculate and choose optimal clustering numbers for the clustering algorithms of the segmentation methods used in this study, which reduced computational costs. For the image segmentation methods using only the supervised classification algorithms, typical lesion pixels and typical health pixels are usually chosen from a large number of disease images to construct the training set. Based on this training set, a supervised classification model with general applicability is built for the lesion segmentation of all the disease images. There may be a certain degree of variation in the color of the lesion regions and the healthy regions of disease images due to the different causal agents and the different stages of disease development. This may result in difficulties in disease image recognition [45]. In the methods used in this study, a targeted training set containing typical lesion pixels and typical healthy pixels was constructed based on each sub-image, and the supervised classification model based on this training set was more suitable for lesion segmentation of this sub-image. However, these segmentation methods are only suitable for lesion segmentation of disease images in which the H component values of the lesion regions are less than the H component values of the healthy regions in HSV color space. Since there are many alfalfa leaf diseases with great differences in color between the lesions of different diseases, it is necessary to develop a lesion image segmentation method with a wider range of application in future studies. In this study, a total of 129 texture, color and shape features were extracted for disease image recognition. Satisfactory recognition results were obtained using the disease recognition models built after feature selection, indicating that the features extracted from the lesion images could be effectively used to recognize and identify the four alfalfa leaf diseases. However, the 129 extracted features are commonly used in the field of image recognition and greatly differ from disease features used by plant disease experts during disease identification via naked-eye observation, resulting in a poor interpretation of the disease recognition models based on these extracted features. In future studies, attempts could be made to construct lesion image features suitable for certain plant diseases, according to the experience of plant diseases experts, in combination with image processing techniques. In this study, the best recognition effects were observed in the SVM model based on the top 45 features in the importance ranking obtained using the ReliefF method. The recognition accuracy of the testing set was highest among all the models built in this study and was very close to the recognition accuracy of the training set, which indicated that this model not only could be used to obtain satisfactory recognition results but also had strong generalization ability. When the ReliefF method was used to conduct feature selection, the possible correlation between the features was not considered. However, the existence of the correlation could lead to the redundancy of features and increase the complexity of the disease recognition models. In further studies, the ReliefF method could be combined with the feature transformation methods such as PCA and independent component analysis to remove the correlation between the features, reduce the dimension of features and decrease the complexity of the disease recognition models. Semi-supervised learning is a technique to conduct training and classification using a small number of labeled samples and a large number of unlabeled samples. In the field of image recognition, the cost of obtaining image samples is very low in some cases, but the cost of adding class labels to samples is very high. In this case, a semi-supervised learning method can be used to build an image recognition model to obtain satisfactory recognition results and reduce the cost of modeling. In research on plant disease image recognition, determining the true categories of diseases requires specialized agricultural technical personnel to conduct naked-eye observations, microscopic observation of morphological characteristics of causal agents, or pathogen detection using molecular biology techniques [7]. Thus, a large amount of manpower and material resources are usually required. Therefore, attempts were made to use semi-supervised learning methods to build image recognition models of alfalfa leaf diseases. The results showed that the recognition accuracies of the training set and the testing set were all approximately 80% for the optimal semi-supervised model when the proportion of the labeled samples in the training set was only 33.33% (i.e., the ratio of the labeled and unlabeled samples was 1:2). This indicated that it was feasible to build an image recognition model of alfalfa leaf diseases based on semi-supervised learning. The image recognition of only four alfalfa leaf diseases was investigated in this study. Therefore, it is necessary to build a standard and comprehensive lesion image database to lay the foundation for the application of the automatic disease image recognition technology. In addition, the complex background of plant disease images poses great challenges for image segmentation and image recognition [45]. The images of alfalfa leaf diseases used in this study were taken on a white background in the laboratory. Further studies are needed to determine whether the image recognition methods used in this study are suitable for the automatic identification and diagnosis of alfalfa leaf diseases in nature. Presently, the use of smart phones to take pictures and process data has become very powerful. Smart phone-based plant disease image recognition systems have been reported [46-49]. A mobile application could be developed using the optimal image recognition model of alfalfa leaf diseases built in this study to realize functions such as disease image acquisition, disease diagnosis and disease information sharing based on smart phone platforms. Such an application could facilitate disease management. Generally, the diagnosis and identification of alfalfa leaf diseases are performed by agricultural experts or agricultural technicians mainly using the conventional diagnostic methods including naked-eye observations of disease symptoms and microscopic observations of morphological characteristics of causal agents. The accuracy and efficiency mainly depend on the experience of experts or technicians. It is subjective and time-consuming. When using PCR techniques to detect the infection of a specific alfalfa leaf disease, professional instruments, reagents and materials are required, and professional personnel are also required to perform operations [7, 50]. In addition, it will take some time to obtain detection results [7, 50]. With increasingly widespread applications of portable cameras or mobile phones with picture-taking features, it is easier to obtain camera equipment than PCR instruments. After image acquisition, it is only needed to input the image into a computer with a disease image recognition system, and then the results of disease identification can be achieved. This process does not require professional personnel or any chemical reagents. It is faster than PCR techniques to achieve identification results. Especially, when the computer image recognition system based on Internet or App (mobile application) based on smart phone is developed, it will be more convenient for image recognition of alfalfa leaf diseases. However, PCR techniques can play an important role in disease detection in the early stage of diseases, especially in detection of latently infected leaves without symptom appearance [50]. The identification and recognition of infected alfalfa leaves in the early stage of diseases using image recognition technology still need more investigations in future studies. Moreover, the method for identification of alfalfa leaf diseases in this study was developed based on the images of four types of alfalfa leaf diseases, it is necessary to conduct further research on evaluating this method with other leaf disorders to evaluate the risk of false positive.

Conclusions

In this study, lesion image segmentation using the methods integrating with clustering algorithms and supervised classification algorithms, feature extraction of lesion images, feature normalization and feature selection were conducted. The disease recognition models were built by using pattern recognition methods. The satisfactory recognition results for four alfalfa leaf diseases were obtained. A feasible solution was provided for diagnosis and identification of alfalfa leaf diseases. Among the twelve lesion segmentation methods integrating with clustering algorithms and supervised classification algorithms, the segmentation effects were best when the segmentation method integrating with the K_median clustering algorithm (from the clustering algorithms) and the linear discriminant analysis (from the supervised classification algorithms) was used based on an aggregated image dataset comprising 899 sub-images of four types of alfalfa leaf diseases. This segmentation method was thus used to carry out the segmentation of sub-images of four types of alfalfa leaf diseases for further feature extraction, feature normalization, feature selection and modeling. A total of 129 texture, color and shape features were extracted from the 1,651 typical lesion images, each of which contained only one lesion. Attempts were made to conduct feature selection using three methods including the ReliefF method, the 1R method and the CFS method. The disease recognition models were built using three supervised learning methods, including random forest, SVM and KNN. The results demonstrated that the recognition effects were best in the SVM model based on the top 45 features in the importance ranking for recognition when the ReliefF method was used to conduct feature selection. For this model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. In addition, after the 45 features used for building the model were transformed using PCA, the disease recognition semi-supervised models were constructed using a self-training algorithm based on Naive Bayes classifiers. For the optimal semi-supervised models built with ratios of labeled to unlabeled samples equal to 2:1, 1:1 and 1:2, the recognition accuracies of the training set and the testing set were all approximately 80%. The results indicated that it was feasible to identify and recognize four types of alfalfa leaf diseases using the solution provided in this study.

Cumulative contribution rates with increase in number of principal components based on 45 features used for building Model 4.

(XLSX) Click here for additional data file.

Recognition accuracies of the training set and the testing set using semi-supervised models with a different number of principal components (ratio of labeled to unlabeled samples was 2:1).

(XLSX) Click here for additional data file.

Recognition accuracies of the training set and the testing set using semi-supervised models with a different number of principal components (ratio of labeled to unlabeled samples was 1:1).

(XLSX) Click here for additional data file.

Recognition accuracies of the training set and the testing set using semi-supervised models with a different number of principal components (ratio of labeled to unlabeled samples was 1:2).

(XLSX) Click here for additional data file.
  1 in total

1.  A pixel-based color image segmentation using support vector machine and fuzzy C-means.

Authors:  Xiang-Yang Wang; Xian-Jin Zhang; Hong-Ying Yang; Juan Bu
Journal:  Neural Netw       Date:  2012-05-11
  1 in total
  8 in total

1.  Circle Fitting Based Image Segmentation and Multi-Scale Block Local Binary Pattern Based Distinction of Ring Rot and Anthracnose on Apple Fruits.

Authors:  Qin Feng; Shutong Wang; He Wang; Zhilin Qin; Haiguang Wang
Journal:  Front Plant Sci       Date:  2022-06-09       Impact factor: 6.627

2.  A Graph-Related High-Order Neural Network Architecture via Feature Aggregation Enhancement for Identification Application of Diseases and Pests.

Authors:  Jianlei Kong; Chengcai Yang; Yang Xiao; Sen Lin; Kai Ma; Qingzhen Zhu
Journal:  Comput Intell Neurosci       Date:  2022-05-26

3.  Automatic Image-Based Plant Disease Severity Estimation Using Deep Learning.

Authors:  Guan Wang; Yu Sun; Jianxin Wang
Journal:  Comput Intell Neurosci       Date:  2017-07-05

4.  Apple Leaf Diseases Recognition Based on An Improved Convolutional Neural Network.

Authors:  Qian Yan; Baohua Yang; Wenyan Wang; Bing Wang; Peng Chen; Jun Zhang
Journal:  Sensors (Basel)       Date:  2020-06-22       Impact factor: 3.576

5.  L2MXception: an improved Xception network for classification of peach diseases.

Authors:  Na Yao; Fuchuan Ni; Ziyan Wang; Jun Luo; Wing-Kin Sung; Chaoxi Luo; Guoliang Li
Journal:  Plant Methods       Date:  2021-04-01       Impact factor: 4.993

6.  Recognition of Maize Phenology in Sentinel Images with Machine Learning.

Authors:  Alvaro Murguia-Cozar; Antonia Macedo-Cruz; Demetrio Salvador Fernandez-Reynoso; Jorge Arturo Salgado Transito
Journal:  Sensors (Basel)       Date:  2021-12-24       Impact factor: 3.576

7.  Apple Disease Recognition Based on Convolutional Neural Networks With Modified Softmax.

Authors:  Ping Li; Rongzhi Jing; Xiaoli Shi
Journal:  Front Plant Sci       Date:  2022-05-03       Impact factor: 5.753

8.  A system-theoretic approach for image-based infectious plant disease severity estimation.

Authors:  David Palma; Franco Blanchini; Pier Luca Montessoro
Journal:  PLoS One       Date:  2022-07-26       Impact factor: 3.752

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.