Literature DB >> 34926737

Image dataset on the Chinese medicinal blossoms for classification through convolutional neural network.

Mei-Ling Huang1, Yi-Xuan Xu1, Yu-Chieh Liao1.   

Abstract

Tree blossoms have been widely used on the prevention and treatment of a variety of diseases in traditional Chinese medicine for thousand years [1,2]. The growth of flowers is not only for their ornamental value, but also for nutritional, medicinal, cooking, cosmetic and aromatic properties. They are a rich source of many compounds, which play an important role in various metabolic processes of the human body [3]. Edible flowers can promote the global demand for more attractive and delicious food, and can improve the nutritional content of gourmet food [4]. Flowers are beneficial for anti-anxiety, anti-cancer, anti-inflammatory, antioxidant, diuretic and immune-modulator, etc. It is very important to identify edible flowers correctly, because only a few are edible [5]. The shapes or colors of different flowers may be very similar. Visual evaluation is one of the classification methods, but it is error-prone and time-consuming [6]. Flowers are divided into flowers from herbaceous plants (flower) and flower trees (blossom). Now there is a public herbaceous flower dataset [7], but lack of dataset for Chinese medicinal blossoms. This article presents and establishes the dataset for twelve most commonly and economically valuable blossoms used in traditional Chinese medicine. The dataset provide a collection of blossom images on traditional Chinese herbs help Chinese pharmacist to classify the categories of Chinese herbs. In addition, the dataset can serve as a resource for researchers who use different algorithms of machine learning or deep learning for image segmentation and image classification.
© 2021 The Author(s). Published by Elsevier Inc.

Entities:  

Keywords:  Chinese medicinal blossom; Classification; Data augmentation; Deep learning

Year:  2021        PMID: 34926737      PMCID: PMC8648792          DOI: 10.1016/j.dib.2021.107655

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table

Value of the Data

The dataset provide a collection of blossom images on traditional Chinese herbs help Chinese pharmacist to classify the categories of Chinese herbs. This dataset can be used not only as an atlas of botany, but also as a training material for Chinese medicine courses. This dataset contribute the expansion of blossom images on traditional Chinese herbs. Blossom image data help researchers to understand the performance of new algorithms for object detection and image segmentation.

Data Description

The blossom images of traditional Chinese medicinal herbs were captured by Google search. The images were divided into twelve categories: (1) Syringa, (2) Bombax malabarica, (3) Michelia alba, (4) Armeniaca mume, (5) Albizia julibrissin, (6) Pinus massoniana, (7) Eriobotrya japonica, (8) Styphnolobium japonicum, (9) Prunus persica, (10) Firmiana simplex, (11) Ficus religiosa and (12) Areca catechu. The dataset uploaded to Mendeley is arranged in twelve folders named by blossom categories. The number of original images was 1716. Fig. 1 shows examples of the original blossom images for twelve Chinese medicinal herbs. There are both close-up photography and telephoto images for each category.
Fig. 1

Examples of Chinese medicine blossom categories.

Examples of Chinese medicine blossom categories. The nomenclature used in the name of the images describes the category, image number in parenthesis, data augmentation method, and image format. For example, the file name “1 (1).JPG” is the first image for the first category “Syringa”; the file name “12 (2).JPG_brighter.jpg” is the second image for the twelfth category “Areca catechu” with augmentation executed by increasing the image brightness.

Experimental Design, Materials and Methods

Fig. 2 shows data processing steps: image acquisition, image preprocessing, image partition, image augmentation, and image classification as follows.
Fig. 2

Data processing steps.

Data processing steps.

Image acquisition

Of all the 57 types flower Chinese herbal medicines, there are 12 trees, 9 shrubs, 8 small trees, and 29 herbs. This study selects and establishes the dataset for twelve most commonly and economically valuable tree blossoms used in traditional Chinese medicine. Blossom images were captured through public dataset, personal blog, and government website, etc.

Image preprocessing

We evaluated the blossom images by cropping letters and frames, deleting handwriting and blurred images, centering the blossoms, and adjusting the length and width. The number of images in each category is outlined as follows: (1) Syringa, 191; (2) Bombax malabarica, 172; (3) Michelia alba, 122; (4) Armeniaca mume, 236; (5) Albizia julibrissin, 222; (6) Pinus massoniana, 87; (7) Eriobotrya japonica, 115; (8) Styphnolobium japonicum, 213; (9) Prunus persica, 89; (10) Firmiana simplex,75; (11) Ficus religiosa 126; and (12) Areca catechu, 68. The image file size is not equal, and the image format is in JPG.

Image partition

We amassed a total of 1716 original images in twelve categories. The images were randomly chosen to be divided into training, validation, and test subsets at 80:10:10 ratio for each category. For example, the numbers of training, validation, and test images for Syringa are 153, 19, and 19, respectively. The total number of original images for training, validation, and test subsets were 1376, 170 and 170, respectively.

Image augmentation

Data augmentation creates image diversity to enhance performance of classification models. There are many augmentation methods [9], and the benefits may differ from augmentation methods and data characteristics. We select Gaussian filtering, image brightness augmentation, image brightness reduction, mirror rotation, noise increase, 90° rotation, and 180° rotation methods; eight methods in total. Data augmentation was applied in the training and validation datasets. Images were increased to eight times. Fig. 3 shows an example of the original image and the images obtained after data augmentation. Table 1 presents the number of training, validation, and test images before and after data augmentation. Fig. 4 represents the architecture of the dataset.
Fig. 3

Example of data augmentation.

Table 1

Number of images before and after data augmentation.

Original
After Data Augmentation
IDNameTrainValTestTotalTrainValTestTotal
1Syringa15319191911224152191395
2Bombax malabarica13817171721104136171257
3Michelia alba9812121227849612892
4Armeniaca mume18824242361504192241720
5Albizia julibrissin17822222221424176221622
6Pinus massoniana709887560728640
7Eriobotrya japonica9211121157368812836
8Prunus persica17121212131368168211557
9Firmiana simplex729889576728656
10Ficus religiosa607875480568544
11Styphnolobium japonicum101131212680810412924
12Areca catechu556768440487495

Total1376170170171611008136017012538
Fig. 4

Architecture diagram of dataset.

Example of data augmentation. Number of images before and after data augmentation. Architecture diagram of dataset.

Image classification

CNN models are the most commonly used for image classification. We selected AlexNet and InceptionV3 models to identify the categories for twelve traditional Chinese medicinal blossoms. Krizhevsky et al. [10] proposed the AlexNet model in 2012. The AlexNet model architecture exhibits eight layers; the first five layers are convolutional layers and the last three layers are fully connected layers. To be more computational efficient, techniques commonly used in InceptionV3 include factorized convolutions, regularization, dimension reduction, and parallelized computations. Tables 2 and 3 showed the results of these two classification models for the datasets before and after data augmentation. Before data augmentation, the accuracy, precision, recall, F1-score, and training time of AlexNet were 93.57%, 92.98%, 94.52%, 93.62%, and 0 h 1 min 17 s, respectively; the accuracy, precision, recall, F1-score, and training time of InceptionV3 were 89.18%, 88.21%, 90.06%, 88.79%, and 0 h 8 min 14 s, respectively. After data augmentation, the accuracy, precision, recall, F1-score, and training time of AlexNet were 98.53%, 98.41%, 98.50%, 98.45%, and 0 h 9 min 26 s, respectively; the accuracy, precision, recall, F1-score, and training time of InceptionV3 were 98.61%, 98.61%, 98.55%, 98.58%, and 1 h 5 min 51 s, respectively. Fig. 5 represents the training curves for the two models for dataset before and after data augmentation.
Table 2

Before data augmentation.

AccuracyPrecisionRecallF1-scoreTime
AlexNet93.57%92.98%94.52%93.62%00:01:17
InceptionV389.18%88.21%90.06%88.79%00:08:14
Table 3

After data augmentation.

AccuracyPrecisionRecallF1-scoreTime
AlexNet98.53%98.41%98.50%98.45%00:09:26
InceptionV398.61%98.61%98.55%98.58%01:05:51
Fig. 5

Training curves.

Before data augmentation. After data augmentation. Training curves.

Ethics Statement

This study did not conduct experiments involving humans and animals.

CRediT Author Statement

Mei-Ling Huang: Conceptualization, Methodology, Writing- Original draft preparation, Investigation, Supervision,Writing- Reviewing and Editing, Funding acquisition; Yi-Xuan Xu: Conceptualization, Methodology, Writing- Original draft preparation, Software, Data curation; Yu-Chieh Liao: Software, Formal Analysis, Data curation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
SubjectAgricultural Sciences, Computer Science
Specific subject areaImage processing, Image identification, Image classification, computer vision
Type of dataImages
How data were acquiredBlossom images were captured by Google search.
Data formatRaw digital image (JPG format)
Parameters for data collectionBoth close-up photography and telephoto images for each category were collected. Blurred images were deleted.
Description of data collectionImages of Chinese medicinal blossoms were collected and classified into twelve categories.
Data source locationInstitution: National Chin-Yi University of TechnologyCity: TaichungCountry: TaiwanLatitude 24.1450556 and Longitude 120.73011
Data accessibilityRepository name: Chinese medicinal blossom-dataset [8]Data identification number: 10.17632/r3z6vp396m.1Mendeley Data, V1, https://doi.org/10.17632/r3z6vp396m.1
  2 in total

Review 1.  The flower head of Chrysanthemum morifolium Ramat. (Juhua): A paradigm of flowers serving as Chinese dietary herbal medicine.

Authors:  Hanwen Yuan; Sai Jiang; Yingkai Liu; Muhammad Daniyal; Yuqing Jian; Caiyun Peng; Jianliang Shen; Shifeng Liu; Wei Wang
Journal:  J Ethnopharmacol       Date:  2020-06-25       Impact factor: 4.360

2.  Edible flowers--a new promising source of mineral elements in human nutrition.

Authors:  Otakar Rop; Jiri Mlcek; Tunde Jurikova; Jarmila Neugebauerova; Jindriska Vabkova
Journal:  Molecules       Date:  2012-05-31       Impact factor: 4.411

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.