Literature DB >> 35432624

Tuning of data augmentation hyperparameters in deep learning to building construction image classification with small datasets.

André Luiz C Ottoni^1,2, Raphael M de Amorim², Marcela S Novo³, Dayana B Costa⁴.

Abstract

Deep Learning methods have important applications in the building construction image classification field. One challenge of this application is Convolutional Neural Networks adoption in a small datasets. This paper proposes a rigorous methodology for tuning of Data Augmentation hyperparameters in Deep Learning to building construction image classification, especially to vegetation recognition in facades and roofs structure analysis. In order to do that, Logistic Regression models were used to analyze the performance of Convolutional Neural Networks trained from 128 combinations of transformations in the images. Experiments were carried out with three architectures of Deep Learning from the literature using the Keras library. The results show that the recommended configuration (Height Shift Range = 0.2; Width Shift Range = 0.2; Zoom Range =0.2) reached an accuracy of 95.6 % in the test step of first case study. In addition, the hyperparameters recommended by proposed method also achieved the best test results for second case study: 93.3 % .

Entities: Chemical

Keywords: Building construction image classification; Convolutional neural networks; Data augmentation; Deep learning; Hyperparameter tuning

Year: 2022 PMID： 35432624 PMCID： PMC9005628 DOI： 10.1007/s13042-022-01555-1

Source DB: PubMed Journal: Int J Mach Learn Cybern ISSN： 1868-8071 Impact factor: 4.012

Introduction

Deep Learning methods have important applications in the Digital Image Processing field [4, 23, 42, 47]. In this sense, a possible application of Deep Learning is building construction area [8, 10, 14, 41, 53]. In recent literature, there are several applications in this research field, such as: crack detection [8, 54], road crack classification [53], safety guardrail detection [22], structural damage recognition [11], detecting safety helmet [41], safety harness detection [10], classification of rock fragments [50], damage detection of a steel bridge [1], tunnel lining defects [49] and facade defects classification [14]. Deep Learning methods can also be applied in recognition of vegetation in building facades images [32]. In fact, the growth of biological manifestations on building facades may indicate the deterioration and degradation of constructions [2, 24]. In addition, the detection of this pathology in inspection images can assist in the conservation of historic buildings [7, 21, 24, 37]. In this sense, [32] proposes a Deep Learning approach for recognizing vegetation in buildings. Another possibility is to use Deep Learning analysis of roof structures [33]. In the literature, there are several examples of works that investigated the efficient of roofs structure [5, 12, 33, 43]. For example, in a recent study, [33] proposes a methodology to tuning of two hyperparameters (learning rate and optimizer) of Neural Networks in the building roof image classification. It is also worth noting that, one of the relevant factors on [32] and [33] was the experiments with Data Augmentation. [32] verified the improvement in validation accuracy when using Data Augmentation to increase the training database. In fact, Data Augmentation techniques play an important role in the application of Machine Learning in small datasets [4, 9, 42, 51, 52]. This is because, the generation of artificial images directly contributes to increase the capacity for the generalization of the Deep Learning model and thus decrease the chance of overfitting [4, 9]. In this respect, one of the challenges of using Data Augmentation is the definition of which transformations (such as zoom, rotation, flip) will be applied to the images [6, 28, 34, 44, 48]. In terms of Machine Learning, this problem can be treated as in the area of Hyperparameter Tuning [19, 20, 27, 30, 31, 39, 40]. In the literature, some studies have analyzed the influence of Data Augmentation hyperparameter combinations in different applications, such as: plant classification [34], transmission line inspection [44] and covid-19 diagnostic process in chest X-ray radiological imaging [28]. In [48], different types of Data Augmentation methods were analyzed for crack detection in constructions. However, the literature lacks proposals to optimize the combinations of Data Augmentation hyperparameters for the application of Deep Learning in building construction image classification, especially in the recognition of vegetation on building facades and roofs defects classification. The objective of this paper is to propose a rigorous methodology for tuning of Data Augmentation hyperparameters in Deep Learning to building construction image classification with small data sets. For this, two case studies are observed: vegetation recognition in facades [32] and roofs structures analysis [33]. In order to do that, Logistic Regression models [16] will be used to analyze the performance of Convolutional Neural Networks (CNN) [4, 9] trained from 128 combinations of transformations in the images. For comparison purposes, three CNN architectures from the literature will also be adopted: MobileNet [17], DenseNet-121 [18] e CNN8 [32]. This paper is organized into five sections. Section 2 presents theoretical concepts of CNNs and Logistic Regression. Section 3 presents the proposed methodology. Sections 4 and 5 describe the results and conclusions, respectively.

Theoretical foundation

Convolutional neural networks

Convolutional neural networks (CNNs) are Deep Learning methods with several researches in computer vision field [4, 9, 15, 23]. One of the main factors that make CNNs a relevant Machine Learning technique is the ability to automatically extract features from processed images [9]. In addition, another important point is the use of layers and elements with different functionalities in the network architecture, such as [4, 9]:Thus, in view of the complexity of the CNN architectures, an important factor is the use of tools for efficient implementation [4, 9]. In this line, it is worth mentioning the Keras library [4]. Keras is available on R interface1 and Python language for development of Deep Learning applications. In addition, it allows execution on CPU or GPU. Another relevant factor is the simplicity to use Data Augmentation methods. In this sense, the Keras library was adopted in this work, as described in Sect. 3. Input layer: receives input signals (e.g .: image). Weights: adjusted during the training process (trainable parameters). Convolutional filters (kernels): have a set of weights, according to their size. For example, if the kernel size is , then the filter contains 9 trainable weights. Activation function: transforms a signal into a limited output. Some examples are the functions ReLu, softmax and sigmoid. Convolutional Layer: applies the convolution operation between the filters and the input matrix in the layer. As an output, new matrices are generated (feature map), according to the number of kernels in the layer. Pooling: applies a transformation to decrease the input matrix dimensions. For this, statistical functions can be used: maximum (max) or average (avg). Flatten: transforms the matrices resulting from convolutional operations into a single vector. Dropout: randomly disconnects a set of neurons at each training epoch. Fully connected layers: similar to the structures of traditional Artificial Neural Networks, in which, all neurons and layers are connected. Output layer: shows the output of CNN, such as a binary classifier neuron.

Logistic regression

The methods based on Linear Regression (simple and multiple) aim to model a continuous output from one or more independent variables [29]. On the other hand, Logistic Regression is a technique addressed for the analysis of categorical data [13, 16]. Moreover, in a logistic function, the variable response is binary or dichotomous. The Logistic Regression model can be represented by Eq. (1) [13]:where, to are the coefficients of the regression model; x are the independent variables; and p(x) is the probability of success. Thus, if p(x) is the probability of an event occurring, then the expression represents the probability of an event not occurring. The ratio between p(x) and is called chance (Eq. (2)) [13]:In this line, the neperian logarithm of chance provides a linear model, according to Eq. (3) [13]:where, this equation is called logit and is a simplification of the Logistic Regression model.

Methodology

Database of the first case study

In this study, the database presented by [32] was used for training, validation and testing of Convolutional Neural Networks in the recognition of vegetation on building facades. The dataset has 390 images, divided into two classes: According to [32], the images of the training and validation datasets were defined from photographs adapted from The Zurich Urban Micro Aerial Vehicle Dataset [26]. These images were recorded in 2015 by an vant from the urban streets of Zurich (Switzerland). On the other hand, the images of the test dataset were selected from the website Pixabay.2 The dataset analyzed during the current study are available from the corresponding author on request or in the web link:Figures 1 and 2 present examples of images from the database of classes 0 and 1, respectively.

Fig. 1

Examples of images of the class 0 - without vegetation on the facade

Fig. 2

Examples of images of the class 1 - with vegetation on the facade

Class 0: without vegetation on the building’s facade. Class 1: with vegetation on the building’s facade. drive.google.com/file/d/1l6KA80mZdKqxlpfpenIH57mCYESL3uyq/ Examples of images of the class 0 - without vegetation on the facade Examples of images of the class 1 - with vegetation on the facade The database (390 images) was partitioned following the same structure proposed by [32]: Training (250 images): 125 images in class 0 (without vegetation) and 125 images in class 1 (with vegetation). Validation (50 images): 25 images in class 0 (without vegetation) and 25 images in class 1 (with vegetation). Test (90 images): 45 images in class 0 (without vegetation) and 45 images in class 1 (with vegetation).

Data augmentation

Data preprocessing is an important stage in the Machine Learning field [9]. This is because, this step can decrease the learning complexity and improve the accuracy results [9]. In this line, Data Augmentation techniques can be used in the training of CNNs [4, 9, 42, 51, 52]. Data Augmentation is an approach applied mainly to small data learning [4, 42]. In this sense, Data Augmentation methods generate more data for training from the existing images [4]. The aim is to increase the CNN model’s ability to generalize and avoid overfitting [4, 9]. For this, artificial images are created from random transformations in the original data [4]. Zoom, rotation and flip are some examples of possible transformations for the generation of augmented images [9]. In this paper, Data Augmentation methods were applied from Keras library in R software [4]. For this, a image_data_generator() method was adopted [4]. The function image_data_generator() generates batches of data with new modified images from the original data. In this regard, the following random transformations3 were used to increase training data [4]:Figure 3 presents examples of images generated by Data Augmentation with the Keras library.

Fig. 3

Examples of images generated by Keras data augmentation: a original image; b–d rotation range (); e horizontal flip (); f vertical flip (); g and h height shift range (); l shear range (); j–l width shift range (); m–p zoom range ()

Rotation range: an integer number that defines the degree range for random rotations. Rotation is a circular movement around a fixed point. The processing images will have random rotations on a predefined range of degrees according with data entrance. Horizontal flip: if this input is“true”them the images will be randomly mirrored in the horizontal direction (left-right). Vertical flip: if this input is“true”them the images will be randomly mirrored in the vertical direction (up-bottom). Shear range: distort the image along an axis to create or rectify the perception angles. There are two shear transformation, X-Shear that shift X coordinates values and Y-Shear that’s shift Y coordinates values. Width shift range: shifts the image randomly to the left or to the right (horizontal shifts). If the value is float and 1 it will take the percentage of total width as range. For example, in an image that width is 100 pixels and if width_shift_range = 1.0 then it will shift image randomly between -100% to 100% or -100px to 100px. Positive values will shift the image to the right side and negative values will shift the image to the left side. Height shift range: shifts the image randomly to up or down (vertically shifts). If the value is float and 1 it will take the percentage of total height as range. For example, in an image that height is 100 pixels and if height_shift_range = 1.0 then it will shift image randomly between -100% to 100% or -100px to 100px. Positive values will shift the image to the upside and negative values will shift the image to underside. Zoom range: it will do a randomly augmentation of the image adding new pixels values. It can be specified with the percentage of the zoom as single float or a range as an array. For example, if zoom_range = 0.4 the range will be [0.6, 1.4] between 60% (zoom in) or 140% (zoom out). Examples of images generated by Keras data augmentation: a original image; b–d rotation range (); e horizontal flip (); f vertical flip (); g and h height shift range (); l shear range (); j–l width shift range (); m–p zoom range () The number of artificially generated images depends on training settings: , and epoch. For example, in this study, these parameters were defined in first phase of experiments as: = 32; = 100; and epoch = 10. Thus, for each simulation were generated randomly around 32,000 new images for training. This value is more than 100 times greater than the number of original photographs for training (250).

Neural network architectures

In this paper, three CNNs architectures were adopted: CNN-8 [32], DenseNet-121 [18] and MobileNet [17]. Recently, these structures (or variations) are discussed in some papers in research field of building construction image processing with deep learning, such as, in the tasks: crack detection (DenseNet) [46], structural health monitoring (FC-DenseNet) [38], detecting safety helmet (SSD MobileNet) [41], road damage detection (SSD MobileNet) [25] and recognition of vegetation in buildings (CNN-8) [32]. In this study, CNN architectures were used for binary classification, for example, class 0 (without vegetation on the building’s facade) and class 1 (with vegetation on the building’s facade) [32]. For this, the keras_model _sequential() method in the Keras library was used [4], as described below:In all experiments, CNN architectures were trained with an adagrad optimizer and a learning rate of 0.01. In addition, the dimensions were standardized as input to the neural network: input_shape = c(50, 50, 3). It is also noteworthy that all three architectures were configured with the last two layers as fully connected. In the last layer having the binary classifier neuron with sigmoid activation function [4]. CNN-8: CNN architecture used by [32] to vegetation image recognition in buildings. The structure has 8 layers and 3,985,345 trainable parameters. In addition, this architecture is based on a model proposed by [4], originally with 12 layers. DenseNet-121: Dense Convolutional Network is an architecture proposed by [18]. This structure is characterized by connecting each layer to all other layers (dense connection). Moreover, it has 7,479,169 trainable parameters. To use this architecture, the application_densenet121() method in the Keras library was adopted. MobileNet: CNN architecture proposed by [17] for mobile and embedded vision applications. The structure uses deptwise separable convolutions (factorized convolutions). In addition, it has 28 layers and 3,732,289 trainable parameters. To use this architecture, the application_mobilenet() method in the Keras library was adopted.

Hyperparameter tuning

Design of experiments

In this section, the design of experiments for tuning of data augmentation hyperparameters with Logistic Regression [16] is described. The simulations of the Convolutional Neural Network models were conducted in the R software [35] with the Keras library [4]. For this, an Intel Core i7-8565 (CPU) and NVIDIA GeForce MX110 (GPU) were used. For the experiments evaluation were used three metrics: accuracy in validation or testing (Acc), number of images correctly classified (C) and number of images incorrectly classified, that is, errors (E). Equations (4) to (6) present these formulas.where,The experiments were conducted in three stages: In the first phase, seven hyperparameters were defined for adjustment, each with two levels of treatments (0 - without transformation and 1 - with transformation):Thus, a total of 128 () combinations of data augmentation hyperparameters were analyzed in the first stage. For each configuration, five CNN models (repetitions) were trained in 10 epochs with 100 steps (steps por epoch) adopting the MobileNet architecture [17]. The metrics observed in this phase were the total number of images correctly classified (C) and errors (E) in the validation dataset, used to adjust a Logistic Regression model. TP: true positives, that is, correct classifications in class 1 (facade with vegetation). FN: false negatives, that is, incorrect classifications in class 1 (facade with vegetation). TN: true negatives, that is, correct classifications in class 0 (facade without vegetation). FP: false positives, that is, incorrect classifications in class 0 (facade without vegetation). Data Augmentation Hyperparameters. Data Augmentation and CNN Architectures. Test Experiments. Rotation Range (R): 0 or 40. Horinzontal Flip (H): FALSE or TRUE. Vertical Flip (V): FALSE or TRUE. Height Shift Range (He): 0 or 0.2. Shear Range (S): 0 or 0.2. Width Shift Range (W): 0 or 0.2. Zoom Range (Z): 0 or 0.2. In the second stage of experiments, the best combinations of the first phase were used. In addition, three CNN architectures from the literature were adopted: MobileNet [17], DenseNet-121 [18] e CNN8 [32]. For each combination (configuration of data augmentation architecture), 5 repetitions were performed with 20 epochs. The total correct classifications (C) and errors (E) were observed in the validation dataset. Furthermore, the accuracy (Acc) in the validation step was also analyzed. Finally, in the third phase of experiments, the hyperparameter combinations performance were analyzed in the test dataset. In this sense, new trainings were carried out with the data augmentation configurations selected in 5 repetitions with 30 epochs. For each of the CNN models trained in this phase, the accuracy in the classification of the test database was analyzed.

Logistic regression method

In this paper, the method for hyperparameter tuning uses Logistic Regression [16]. The objective is to evaluate the probability of hits and errors in the building construction image classification, according to the settings of data augmentation. For this, the response variable (y) is binary and was modeled as follows:For the first step, the explanatory variables (, , , , , and ) refer to the seven hyperparameters analyzed:The Equation (7), in turn, presents the Logistic Regression model (logito format) proposed for the recommendation of hyperparameters:The coefficients () of Eq. (7) can be obtained by the maximum likelihood method [13]. Then, the regression coefficients hypothesis test must be performed. In this sense, the significance of the effects of each variable present in the model is analyzed in two hypotheses:When the initial hypothesis () is accepted (), the variable (associated with the coefficient) does not have statistical significance in the model. On the other hand, if the alternative hypothesis is accepted (), the hyperparameter (k) has significance in the Logistic Regression model. The adjusted coefficients () also may perform the calculations of the odds associated with each hyperparameter configuration. In this aspect, the OR metric represents the odds ratio of correct classification between the two levels of a hyperparameter. For example, the odds ratio for hyperparameter 1 (Rotation Range) is given by Eq. (8):where is the odds ratio of level 1 () in relation to level 0 () of hyperparameter 1 (Rotation Range). Thus, if the chance of success in adopting is greater than . Otherwise (), then the chance of CNN correctly classifying an image is greater if trained without the rotation transformation. A similar analysis can be made after calculating the odds ratio of the other analyzed hyperparameters. Thus, Eq. () presents the general formulation for the odds ratio:Therefore, odds ratio indices are used to define hyperparameter configurations for the sequence of experiments, as will be described in the next subsection (HPtuningLogReg Algorithm). In the sequence, logistic regression models are also used to analyze the results of the second stage of experiments. In this case, the objective is to evaluate the influence of the selected hyperparameter combinations for the three CNN architectures adopted (CNN8 [32], DenseNet-121 [18] and MobileNet [17]). For this, three logistic regression models are adjusted (one per architecture) and the odds ratio indices for the hyperparameter configurations are observed.

HPtuningLogReg algorithm

Algorithm 1 presents the method proposed in R language for tuning of data augmentation hyperparameters with Logistic Regression: HPtuningLogReg Algorithm. The code has been divided into four steps: data input, adjustment of the logistic regression model, hyperparameter tuning and summary. In the first phase (lines 1 to 14), the results of the experiments are read and prepared for the method sequence. The“correct”and “error”vectors store the number of correct and incorrect classifications, respectively, for each of the observed hyperparameter configurations. Then, in line 16, the function glm of the R language is used to adjust the Logistic Regression model. In addition, the anova method (line 17) is also used to perform statistical tests of analysis of variance and to calculate the (“paov”). In sequence, from the adjusted coefficients (“modelglm$coefficients”), the odds ratio measures are calculated (line 18). In phase 3 (lines 19 to 45), the Logistic Regression model is adopted to hyperparameter tuning. For this, a repetition loop is performed by varying the hyperparameter index (data augmentation transformation type): 1 - R, 2 - H, 3 - V, 4 - He, 5 - S, 6 - W and 7 - Z. In line 21, the statistical significance of the variable present in the model is analyzed. If the alternative hypothesis () is accepted (“paov$‘Pr(>Chi)‘[i+1]”) there is significance for the coefficient , that is, there is a statistical difference between the two treatments of the hyperparameter k. In this case, the value of the odds ratio is presented and then recommended hyperparameter level with the greatest chance of success in the validation dataset image classification (lines 22 to 35). On the other hand, if initial hypothesis () is accepted (lines 36 to 45), there is no statistically significant difference between the two values of the hyperparameter k (). Finally, default treatment (“H[i,2]”) is recommended, that is, the transformation k in the data augmentation process should not be applied. In step 4, a summary of the recommended hyperparameter values is presented. In this case, only the hyperparameters whose decision variables received level 1 (C[i] == 1) in step 3 are shown.

Hyperparameter tuning to second small dataset

In this case study, another type of problem in buildings was analyzed: gutter integrity and cleanliness pathology in roofs [43]. For this, images were used from the database presented and described by [33, 43, 45] and made available by the Research Group in Construction Technology and Management (School of Engineering - UFBA).4 These images were captured from roof inspections with an unmanned aerial vehicle. In a previous study, [33] used this dataset in experiments to tuning of two CNN hyperparameters (learning rate and optimizer). For this, the images were divided into two classes: (0) roofs with clean gutters and (1) roofs with dirty gutters. Thus, the database adopted by [33] has 220 images, separated for the training, validation and test phases:Figures 4 and 5 present examples of images (two classes) of the second small dataset.

Fig. 4

Examples of images of the class 0 (roofs with clean gutters) in second small dataset

Fig. 5

Examples of images of the class 1 (roofs with dirty gutters) in second small dataset

Training (160 images): 80 images in class 0 and 80 images in class 1. Validation (30 images): 15 images in class 0 and 15 images in class 1. Test (30 images): 15 images in class 0 and 15 images in class 1. Examples of images of the class 0 (roofs with clean gutters) in second small dataset Examples of images of the class 1 (roofs with dirty gutters) in second small dataset In this sense, the Hyperparameter Tuning methodology (Section 3.4) was adopted in new experiments with this second case study. Then, HPtuningLogReg was applied to tuning of Data Augmentation hyperparameters. The dataset analyzed during the second case study is available from the corresponding author on request or in the web link: https://abre.ai/dataset2

Results

Results of first small dataset

This section presents the results for the first case study: recognition of vegetation on building facades.

Stage 1: Data augmentation hyperparameters

In stage 1, HPtuningLogReg algorithm was adopted to adjust the logistic regression model and tuning of Data Augmentation hyperparameters. For this, results of 128 hyperparameters combinations analyzed were used. The Equation (10) presents the adjusted linear model (logito).Table 1 shows the results of the test statistic (p), recommended values and odds ratio (OR) per hyperparameter.

Table 1

Results of Data Augmentation hyperparameter tuning with logistic regression in stage 1.

Hyperparameter	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta$$\end{document}β	p	Value	x	OR
Rotation R. (R)	– 0.005	0.86	0	0	0.995
Hor. Flip (H)	– 0.118	0.00	False	0	0.888
Vertical Flip (V)	– 0.130	0.00	False	0	0.878
Height S. R. (He)	0.179	0.00	0.2	1	1.196
Shear Range (S)	– 0.008	0.79	0	0	0.992
Width S. R. (W)	0.114	0.00	0.2	1	1.121
Zoom Range (Z)	0.111	0.00	0.2	1	1.118

Bold values indicate the hyperparameters with OR > 1 and p < 0.05

Results of Data Augmentation hyperparameter tuning with logistic regression in stage 1. Bold values indicate the hyperparameters with OR > 1 and p < 0.05 The results of Table 1 shows that the effects related to transformations of Rotation Range and Shear Range have no statistical effect (). Thus, the algorithm recommended the use of the default value for these hyperparameters ( and ). On the other hand, the effects of the variables referring to Horizontal Flip and Vertical Flip showed statistical significance (). However, the odds ratio values for these hyperparameters were less than 1 (). Thus, HPtuningLogReg algorithm recommended the use of and . Table 1 also presents the results for the other three transformations: Height Shift Range, Width Shift and Zoom. The effects of these variables were statistically significant (). The recommended value of Height Shift method was 0.2 with an estimated odds ratio of 1.196. In this respect, the adjusted model reveals that adopting the level has around more chances of success in the image classification, in relation to not adopting this transformation in the training base. The adjusted values for Width Shift and Zoom were also 0.2, with odds ratios of 1.121 and 1.118, respectively. Thus, it is estimated that the chance of correct image classification when using or is around greater than performing the training without these Data Augmentation effects. Thus, from the Logistic Regression results, the HPtuningLogReg algorithm recommended three transformations in the images for the training process: Height Shift Range, Width Shift Range e Zoom Range. These hyperparameters analyzed at two levels each, result in eight combinations of Data Augmentation transformations (). Table 2 presents these combinations set and their respective levels of decision variables

Table 2

Hyperparameter combinations of data augmentation selected in stage 1

Comb.	He	W	Z	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{He}$$\end{document}xHe	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{W}$$\end{document}xW	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{Z}$$\end{document}xZ
1	0	0	0	0	0	0
2	0	0	0.2	0	0	1
3	0	0.2	0	0	1	0
4	0	0.2	0.2	0	1	1
5	0.2	0	0	1	0	0
6	0.2	0	0.2	1	0	1
7	0.2	0.2	0	1	1	0
8	0.2	0.2	0.2	1	1	1

Hyperparameter combinations of data augmentation selected in stage 1 The hyperparameter combinations presented in Table 2 were used in the next stage of experiments, as shown in the following section.

Stage 2: Data augmentation and CNN architectures

In stage 2, the hyperparameter combinations defined in the previous phase were evaluated in conjunction with three architectures in the literature: CNN8 [32], DenseNet-121 [18] and MobileNet [17]. Table 3 presents the results of accuracy in the validation step and the statistical metrics of the logistic regression models (OR and p).

Table 3

Results of validation accuracy () and statistical metrics for each method (CNN architecture + data augmentation combination)

Arch.	Comb.	1	2	3	4	5	Mean	OR	p
	1	78.0	78.0	78.0	76.0	82.0	78.4	1.000	–
	2	86.0	82.0	78.0	78.0	76.0	80.0	1.102	0.66
	3	86.0	82.0	86.0	86.0	84.0	84.8	1.537	0.07
CNN8	4	92.0	88.0	86.0	84.0	92.0	88.4	2.099	0.00
	5	84.0	90.0	80.0	86.0	86.0	85.2	1.586	0.05
	6	84.0	84.0	84.0	86.0	82.0	84.0	1.445	0.11
	7	90.0	92.0	88.0	90.0	88.0	89.6	2.374	0.00
	8	88.0	88.0	86.0	90.0	86.0	87.6	1.946	0.01
	1	88.0	84.0	90.0	88.0	84.0	86.8	1.000	–
	2	90.0	86.0	94.0	92.0	84.0	89.2	1.256	0.41
	3	92.0	92.0	88.0	90.0	86.0	89.6	1.310	0.33
DenseNet-121	4	90.0	88.0	86.0	90.0	92.0	89.2	1.256	0.41
	5	86.0	90.0	94.0	88.0	86.0	88.8	1.206	0.49
	6	90.0	88.0	90.0	88.0	92.0	89.6	1.310	0.33
	7	90.0	94.0	92.0	92.0	90.0	91.6	1.658	0.09
	8	92.0	96.0	96.0	92.0	94.0	94.0	2.382	0.01
	1	76.0	74.0	76.0	78.0	76.0	76.0	1.000	–
	2	84.0	88.0	88.0	88.0	90.0	87.6	2.231	0.00
	3	82.0	84.0	90.0	82.0	78.0	83.2	1.564	0.05
MobileNet	4	92.0	88.0	90.0	88.0	90.0	89.6	2.720	0.00
	5	82.0	80.0	78.0	84.0	82.0	81.2	1.364	0.16
	6	84.0	86.0	86.0	86.0	90.0	86.4	2.006	0.00
	7	88.0	92.0	88.0	86.0	94.0	89.6	2.721	0.00
	8	92.0	90.0	90.0	92.0	90.0	90.8	3.117	0.00

Bold values indicate the data augmentation combinations with p < 0.05

Results of validation accuracy () and statistical metrics for each method (CNN architecture + data augmentation combination) Bold values indicate the data augmentation combinations with p < 0.05 From the Table 3 it is possible to observe that the highest accuracy average () for the CNN8 architecture was achieved by adopting the combination 7 (; ; ). In this case, adopting configuration 7 has approximately 2 times more chances of success () in the classification in relation to the reference combination (; ; ). It is also noteworthy that the four combinations (4, 5, 7 and 8) showed statistical significance (). On the other hand, when analyzing the results of DenseNet-121 in the Table 3, the highest mean accuracy () was obtained by the combination 8. In addition, only this configuration (; ; ) was statistically significant (level of ). The experiments with the MobileNet architecture, revealed that adopt the configuration 8 has around 3 times more chances of success () in image classification. Moreover, five combinations (2, 3, 5, 6, 7 and 8) are statistically different (). In this sense, configuration 8 (; ; ) was the only to present statistical significance for the three architectures. In addition, the odds ratio for this combination was over 1.9 for CNN8, DenseNet-121 and MobileNet. Thus, combination 8 was selected for the sequence of experiments in the test stage. To illustrate, Figs. 6 and 7 present samples of images generated by Data Augmentation, adopting the combination 8: ; ; .

Fig. 6

Examples of images generated by data augmentation (; ; ) for class 0 (without vegetation on the building facade)

Fig. 7

Examples of images generated by data augmentation (; ; ) for class 1 (with vegetation on the building facade)

Examples of images generated by data augmentation (; ; ) for class 0 (without vegetation on the building facade) Examples of images generated by data augmentation (; ; ) for class 1 (with vegetation on the building facade)

Tests results

In this step, simulations were performed out on the test dataset adopting three architectures (CNN8, DenseNet-121 and MobileNet) and two data augmentation configurations:Table 4 presents the accuracy in test step for each analyzed configurations.

Table 4

Maximum accuracy in the test step in each of the repetitions and respective mean accuracy (M) for first small dataset

Arch.	C.	1	2	3	4	5	M.
CNN8	P	95.6	87.8	92.2	93.3	91.1	92.0
CNN8	L	91.1	80.0	93.3	91.1	93.3	89.8
DenseNet	P	77.8	77.8	73.3	83.3	65.6	75.6
DenseNet	L	87.8	77.8	83.3	71.1	70.0	78.0
MobileNet	P	71.1	65.6	81.1	74.4	63.3	71.1
MobileNet	L	54.4	87.8	58.9	55.6	53.3	62.0

Bold values indicate the best result of test accuracy and mean accuracy

Comparison between methods recommended by the data augmentation configurations (Proposed (P) and Literature (L)) and architectures

Proposed in this paper (P) - defined from steps 1 and 2 (Hyperparameter Tuning): (; ; ). Literature (L) - presented in [4] and used by [32] for the same dataset of this study: (; ; ; ; ; ). Maximum accuracy in the test step in each of the repetitions and respective mean accuracy (M) for first small dataset Bold values indicate the best result of test accuracy and mean accuracy Comparison between methods recommended by the data augmentation configurations (Proposed (P) and Literature (L)) and architectures From the Table 4 it is possible to observe that the highest accuracy average () was achieved by the CNN8 architecture when adopting the proposed data augmentation combination. Moreover, this configuration (CNN8 + P) also resulted in the highest accuracy value in one repetition: . This value is equivalent to the correct classification of 86 images out of a total of 90 photographs on the test dataset. In this sense, Table 5 presents the confusion matrix for the adoption of CNN8 + P (Repetition 1).

Table 5

Confusion matrix with the best results for the test step (first small dataset)

TP = 43	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$FN = 2$$\end{document}FN=2
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$FP = 2$$\end{document}FP=2	TN = 43

Bold values indicate the number of true positives (TP) and true negatives (TN)

Confusion matrix with the best results for the test step (first small dataset) Bold values indicate the number of true positives (TP) and true negatives (TN) It can be seen in Table 5 that the CNN model correctly classified 43 images in the positive class and 43 images in the negative class (accuracy of ). Thus, for each class, CNN only missed 2 images in the test dataset (error around ). It is also worth noting that, in the study of [32], the maximum accuracy achieved was for the same test images. Thus, indicating that the careful adjustment of the Data Augmentation hyperparameters can increase the results in the classification. Table 6 summarizes the recommended hyperparameters for the analyzed database.

Table 6

Selected hyperparameters for first small dataset

Hyperparameter	Recommendation
Architecture	CNN8
Height Shift Range (He)	0.2
Width Shfit Rang (W)	0.2
Zoom Range (Z)	0.2

Selected hyperparameters for first small dataset

Results of second small dataset

This section report the results of applying the proposed methodology in a second small dataset: gutter integrity in roofs structures. In this sense, the Equation 11 presents the linear model adjusted for this case study:Table 7 shows the results of the test statistic (p), recommended values and odds ratio (OR) per hyperparameter (second small dataset).

Table 7

Results of data augmentation hyperparameter tuning with logistic regression (second small dataset)

Hyperparameter	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta$$\end{document}β	p	Value	x	OR
Rotation R. (R)	– 0.245	0.00	0	0	0.783
Hor. Flip (H)	0.071	0.07	False	0	1.074
Vertical Flip (V)	– 0.027	0.48	False	0	0.973
Height S. R. (He)	– 0.008	0.85	0	0	0.992
Shear Range (S)	0.089	0.02	0.2	1	1.094
Width S. R. (W)	– 0.024	0.53	0	0	0.976
Zoom Range (Z)	– 0.091	0.02	0	0	0.913

Bold value indicates the hyperparameters OR > 1 and p < 0.05

Results of data augmentation hyperparameter tuning with logistic regression (second small dataset) Bold value indicates the hyperparameters OR > 1 and p < 0.05 Table 7 shows that the only hyperparameter recommended by the HPtuningLogReg algorithm for the second case study was Shear Range (S), because and . On the other hand, four transformations did not reach statistical significance (): Horizontal Flip (H), Vertical Flip (V), Height Shift Range (He) and Width Shift Range (W). In addition, two hyperparameters achieved statistical effect (), but obtained : Rotation Range (R) and Zoom Range (Z). In this regard, the test stage was carried out with two Data Augmentation configurations:Table 8 presents the accuracy results of test step to second small dataset.

Table 8

Maximum accuracy in the test step in each of the repetitions and respective mean accuracy (M) for second small dataset

Arch.	C.	1	2	3	4	5	M.
CNN8	P	83.3	93.3	93.3	86.7	83.3	88.0
CNN8	L	56.7	70.0	66.7	56.7	73.3	64.7
DenseNet	P	70.0	83.3	90.0	90.0	70.0	80.7
DenseNet	L	86.7	86.7	90.0	80.0	86.7	86.0
MobileNet	P	70.0	70.0	60.0	70.0	63.3	66.7
MobileNet	L	70.0	80.0	73.3	83.3	70.0	75.3

Bold values indicate the best result of test accuracy and mean accuracy

Comparison between methods recommended by the data augmentation configurations (Proposed (P) and Literature (L)) and architectures

Proposed in this paper (P) - Hyperparameter Tuning for second small dataset: . Literature (L) - presented in [4] and used by [33]: ; ; ; ; ; . Maximum accuracy in the test step in each of the repetitions and respective mean accuracy (M) for second small dataset Bold values indicate the best result of test accuracy and mean accuracy Comparison between methods recommended by the data augmentation configurations (Proposed (P) and Literature (L)) and architectures Table 8 shows that the highest average accuracy () for second case study was achieved by the CNN8 architecture with the proposed configuration. Moreover, this Data Augmentation combination () achieved the maximum accuracy value in one iteration: .

Comparison with other studies

In this section, a comparative study is carried out between the present proposal and other recent works in the literature: I [32], II [3], III [48], IV [36] and V [54]. For this, five features were observed: CNN application (classification, detection or segmentation), type of problem, analyzed hyperparameters and tuning of Data Augmentation methods. All analyzed paper applied Deep Learning models for image processing of building construction. Table 9 presents the comparison results.

Table 9

Comparison of this proposal with different papers that applied CNNs in the image processing of building construction: I [32], II [3], III [48] , IV [36] and V [54].

		Proposed	I	II	III	IV	V
			[32]	[3]	[48]	[36]	[54]
CNN application	Classification	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	–	–	–	–
	Detection	–	–	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	–
	Segmentation	–	–	–	–	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓
Problem	Crack detection	–	–	–	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓
	Bridge inspection	–	–	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	–	–	–
	Roofs defects classification	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	–	–	–	–	–
	Vegetation in facades	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	–	–	–	–
Analyzed Hyperparameters	Rotation Range	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓
	Horinzontal Flip	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	–	–	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓
	Vertical Flip	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	–	–	–	–	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓
	Height Shift Range	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	–	–	–	–
	Shear Range	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	–	–	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	–
	Width Shift Range	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	–	–	–	–
	Zoom Range	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	–	–	–	–
	Others	–	–	–	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	–
Tuning of Data Augmentation	Yes	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	–	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	–	–
	No	–	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	–	–	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark$$\end{document}✓

Comparison of this proposal with different papers that applied CNNs in the image processing of building construction: I [32], II [3], III [48] , IV [36] and V [54]. From the Table 9 it is confirmed that the main contribution of this paper is the proposal of a methodology for tuning of data augmentation hyperparameters to building construction image classification, especially in vegetation recognition and roofs defects classification. In another way, it should be noted that most of the other studies in this area are dedicated to the problem of crack detection (or segmentation). Furthermore, this proposal innovates by analyzing 128 () combinations of hyperparameters from seven Data Augmentation transformations: rotation range, horizontal flip, vertical flip, height shift range, shear range, width shift range and zoom range. In general, other papers analyze less combinations and transformations of Data Augmentation in the building construction image processing field. Another important contribution is the proposal for the application of logistic regression models for hyperparameter tuning. The papers by [3] and [48] also present methodologies for recommending Data Augmentation hyperparameters. However, these studies are applied to other problems (crack detection or bridge inspection) and do not adopt logistic regression methods.

Conclusion

The objective of this paper was to propose a rigorous methodology for tuning of Data Augmentation hyperparameters in Deep Learning to small datasets. In this sense, the main contributions of this study are:Regarding the results, in the first stage of experiments were recommended three Data Augmentation transformations for first case study: Height Shift Range (He), Width Shift Rang (W) and Zoom Range (Z). According to the Logistic Regression model, adopting guarantees an increase of around of success in the correct image classification. On the other hand, by adopting or the chance of success is increased by approximately . Moreover, from the second stage of experiments, the configuration (; ; ) was the only one to present statistical significance () for the three CNN architectures analyzed. Careful analysis of Data Augmentation transformations in the application of Deep Learning in building image classification, especially in the recognition of vegetation on facades and roofs defects classification. Design of experiments with 128 combinations of Data Augmentation using the Keras library and the R software. Proposal of the HPtuningLogReg method using Logistic Regression to tuning of Data Augmentation hyperparameters. Comparison of Data Augmentation configurations by adopting three Convolutional Neural Network architectures from the literature. Finally, in the testing stage, the selected Data Augmentation configuration reached the highest average accuracy () when adopting the CNN8 architecture. In addition, this combination also resulted in the greatest accuracy in one repetition: . This value is equivalent to the correct classification of 86 images out of a total of 90 photographs on the test dataset of first case study. For the second case study, the logistic regression model recommended the Shear Range transformation for Data Augmentation. In this sense, the hyperparameters selected for this application also achieved the best results in the test phase: . In future work, it is expected to analyze other Data Augmentation transformations. In addition, it is also suggested to test more levels for specific hyperparameters, for example, Zoom Range ranging from to . It is also worth highlighting the importance of investigating possible limitations of the logistic regression model, for example, the proposed approach did not accounted for the interactions among different predictor variables. With interaction effects, each predictor (hyperparameter) influences others and could result others solutions of tuning. Another important point will be the adoption of the HPtuningRegLog method in tuning of Data Augmentation settings in other applications with small dataset of building construction image classification.

5 in total

Review 1. Deep learning.

Authors: Yann LeCun; Yoshua Bengio; Geoffrey Hinton
Journal: Nature Date: 2015-05-28 Impact factor: 49.962