Literature DB >> 35992191

Tensor based stacked fuzzy neural network for efficient data regression.

Jie Li¹, Jiale Hu¹, Guoliang Zhao^1,2, Sharina Huang³, Yang Liu¹.

Abstract

Random vector functional link and extreme learning machine have been extended by the type-2 fuzzy sets with vector stacked methods, this extension leads to a new way to use tensor to construct learning structure for the type-2 fuzzy sets-based learning framework. In this paper, type-2 fuzzy sets-based random vector functional link, type-2 fuzzy sets-based extreme learning machine and Tikhonov-regularized extreme learning machine are fused into one network, a tensor way of stacking data is used to incorporate the nonlinear mappings when using type-2 fuzzy sets. In this way, the network could learn the sub-structure by three sub-structures' algorithms, which are merged into one tensor structure via the type-2 fuzzy mapping results. To the stacked single fuzzy neural network, the consequent part parameters learning is implemented by unfolding tensor-based matrix regression. The newly proposed stacked single fuzzy neural network shows a new way to design the hybrid fuzzy neural network with the higher order fuzzy sets and higher order data structure. The effective of the proposed stacked single fuzzy neural network are verified by the classical testing benchmarks and several statistical testing methods.

© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022, Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Entities: Chemical

Keywords: Extreme learning machine (ELM); Random vector functional link network (RVFL); Tensor stacked fuzzy neural network (TSFNN); Tensor-based type-2 extreme learning machine (TT2-ELM); Tensor-based type-2 random vector functional link network (TT2-RVFL)

Year: 2022 PMID： 35992191 PMCID： PMC9382627 DOI： 10.1007/s00500-022-07402-3

Source DB: PubMed Journal: Soft comput ISSN： 1432-7643 Impact factor: 3.732

Introduction

The random vector functional link (RVFL) and extreme learning machine (ELM) are two popular randomized single layer forward learning networks, which provide us a unified framework for both regression and multi-class classification with single layer. Then the semi-supervised RVFL and ELM networks can be merged into a joint optimization framework, it shows that the algorithm is efficient in moderate scale data classification (Peng et al. 2020). The parameters could be regularized when ridge regression is used (Yildirim and Revan Özkale 2019). When singular value decomposition (SVD) is used for algorithm iterative solution searching, the SVD update algorithm scales better and works faster than SVD computed from scratch (Grigorievskiy et al. 2016). Multi-label learning method could also use the multi-label radial basis function neural network and Laplacian ELM (Xu et al. 2019), in this algorithm, clustering algorithm determines the number of hidden nodes, and the center of the activation function could be determined by the data itself, then the output is solved by a Laplacian ELM. Inspired by biological intelligent systems, bio-inspired learning model blooms a lot recently (Huang and Chen 2016; Alencar et al. 2016; Christou et al. 2019), it can be applied to many areas, such as, anomalous trajectory classification (Sekh et al. 2020), long-term time series prediction (Grigorievskiy et al. 2014), T-S fuzzy model identification (Wei et al. 2020), dictionary learning-based image classification (Zeng et al. 2020), anomaly detection (Hashmi and Ahmad 2019), HRV recognition (Bugnon et al. 2020), energy system (Yaw et al. 2020), mislabeled samples detection (Akusok et al. 2015), concept drift detection (Yang et al. 2020), etc. Although ELM and RVFL have been applied in many fields, their disadvantages are found during verification. ELM has the extremely fast training speed, however, its performance is not stable. For example, to the structural risk minimization approach, ELM easy performs worse in terms of stability, generalization performance and sparsity perspectives. ELM also tends to be over-fitting. The performance of RVFL is more stable than ELM, and the training speed is slower than ELM. In addition, the over-fitting risk of RVFL could be reduced by the enhancement nodes of RVFL. Generalization performance is the main concern for the learning algorithms, balancing computational complexity and generalization ability have been extended via ELM (Ragusa et al. 2020); with the designed data and modeled parallel ELMs, large-scale learning tasks could be tackled by ELM (Ming et al. 2018), moreover, a tradeoff should be made among efficiency and scalability, the algorithm should have complementary advantages. With the aid of graph learning and adaptive unsupervised/semi-supervised clustering method, flexible and discriminative data embedding could be achieved (Zeng et al. 2020; Zheng et al. 2020). By using the regularized correntropy criterion and half-quadratic optimization technique, convergence speed and performance are both showed superiorities than the original (Yang et al. 2020), and the robust type algorithm has been studied (Yang et al. 2020). When inverse-free recursive algorithm is used to update the inverse of the networks’ Hermitian matrix, efficient inverse-free algorithm is designed to update the regularized pseudo-inverse, which has been proposed for ELMs (Zhu and Wu 2020). To the above work, the performance of ELM is enhanced via adding the regularization method which can also be used to avoid the over-fitting risk. The regularization methods are the commonly used and recognized strategy to improve the performance of artificial neural networks (ANN). However, the effect of the regularization method is limited for the performance, when artificial neural networks are used to handle the complex and huge data. In order to make the ANN models suitable for dealing with the complex data structures, two strategies can be considered. The first strategy is to add methods that can optimize the ANN structure and mechanism. These group of methods are the parameter (i.e. hidden nodes numbers) selection method, hidden layer pruned method, fuzzy structure, and so on. When big data environment is encountered, a fast parameter selection scheme for modeling the large amount of data is needed, alternating direction method and maximally splitting method could be applied to the algorithms to minus the number of the sub-model and coefficients training (Lai et al. 2020). Concerning the credit probability for network output, probabilistic output from the original architecture of ELM is proposed, iterative way of learning is eliminated, and the merits of ELM is preserved (Wong et al. 2020). Using Bayesian inferences, multiple-instance learning-based ELM has proven to be efficient in classification problems (Wang et al. 2020). Optimally pruned ELM (Miche et al. 2010) is presented to both regression and classification problems, the proposed algorithm could counter the effect of noise. To move forward, an regularization penalty applied to the optimally pruned ELM, and a double-regularized ELM using LARS and Tikhonov regularization is proposed (Miche et al. 2011). Missing data case for the regression problem is studied (Yu et al. 2013). When training sample selection method is designed based on the fuzzy C-means clustering algorithm, and the proposed small training samples selection-based hierarchical ELM could reduce the computational time (Xu et al. 2020). Random vector functional link networks (RVFL) (Zhang and Suganthan 2016) could also use the techniques mentioned above, such as ridge regression (Zhang and Suganthan 2017), and its extended version, the new learning paradigm is named as RVFL plus (RVFL+) (Zhang 2020), it has been used in neuro-imaging-based parkinson’s disease diagnosis (Xue et al. 2018; Shi et al. 2019). The type-2 fuzzy set is commonly used for their excellent ability in modeling uncertain information. The generalization performance and uncertain modeling ability of ELM-like models can be extended via the type-2 fuzzy set. Motivated by the ELMs and RVFLs, generalized Moore–Penrose inverse and triangular type-2 fuzzy sets are used to extend the ELM, and tensor-based ELM has been proposed (Huang et al. 2019). RVFL network has been also expanded to tensor case (Zhao and Wu 2019), the type-reduction method for general type-2 fuzzy sets are removed in this type of network. The second strategy is to improve the structure of ANN. For example, the deep network technology has been studied maturely, the performance of ANN could be significantly enhanced by the multi-hidden layers. It is noted that the computational complexity is increased, and the training speed is slowed gradually with the increasing of the hidden layers. In the recent decade, deep networks have been used in the control system of the real world. The uncertain information and big data problem are existed in control system, especially in the robot control system. In Lu et al. (2019), residual errors and state variables of system are handled via RNN model. The time-varying under-determined linear system with double bound limits for real robot system are simulated and verified. Li et al. (2021), neural networks are employed to test the advantages and disadvantages of robot parameter calibration technology. Moreover, the kinematic parameters of robot arms are also modeled via neural network (Li et al. 2022). Six regularization schemes combine with least square method and LM algorithm are used to solve over-fitting problems, and the robot calibration parameters are identified with these techniques. Besides the usual fuzzy sets, the pythagorean fuzzy set (PFS) is also used in control problems that is proposed in Ejegwa et al. (2022), which can be used to compute the correlation between PFSs, modeling of the complex control system and pattern recognition via fuzzy theory is the aim of this work. Feng et al. (2018), drive-response memristive neural networks (MNNs) is used to dealt with problem of the time delay, control widths and rest widths in intermittent control system. The problem of asymptotic synchronization of MNNs is solved via quantized intermittent control and weighted double-integral inequalities. The strategy of constructing network with composite structure via serval neural networks is another way to improve the structure of ANN. At present, the mainstream method is to divide a part of neurons into the network’s hidden layers to train other networks. These networks are used instead of the original fully connected part in the main network. This way has been used in single or deep networks. However, the performance of this composite structure has limited the ability of the main network. As everyone knows, the tensor is the higher order structure, which can be unfolded into serval matrixes. In a word, tensor space is a higher order array space, which includes matrix space. Therefore, tensor structure can be used to fuse serval lower order structure. Huang et al. (2019) and Zhao and Wu (2019), the type-2 fuzzy set is combined into the 4-D tensor space. Motivated by the above-mentioned work, we have noticed that the type-2 fuzzy sets and tensor structure provide a new way to model the complex data, whether the ELMs, RVFLs or the neuro-fuzzy systems could be used under the tensor structure. Our target is to unveil the links or laws behind the data, and a new tensor-based stacked neural network for efficient data regression is studied. To get the merits of the algorithms, a good way is to fuse the algorithms into one frame, the balance of performance and structure simplicity would be achieved. The tensor-based stacked neural network can be considered as a new composite structure of merging several models, which also can be treated as a novel neural network fused structure. To the best of our knowledge, little related research work has studied with type-2 fuzzy sets under tensor structure. To fuse different concept and techniques into the algorithm, it is inevitable to extract the different aspect of the data, then different view results of the data can be obtained, then the original proposed algorithms could be used to minimize the testing error. Go back to the type-2 fuzzy sets, it could map the data with different parameter-specified fuzzy membership functions with at least three type-1 fuzzy membership functions, then the multi-view on the data could be obtained. A question follows this is how to fuse the results into one data structure, tensor is the suitable choice for these types of learning methods, this is the motivation of the work. The main contributions of this work are provided as below:The structure of the rest of paper is as follows: Sect. 2 introduces the three algorithms that are used by stacking system, Sect. 2.1 introduces the tensor-based type-2 RVFL, Sect. 2.2 presents the tensor-based type-2 ELM, and Sect. 2.3 is the introduction to the TROP-ELM, it is used to compare algorithms’ performance. In Sect. 3, the structure of tensor-based hybrid single fuzzy neural networks, that is, a stacked single fuzzy neural network is presented. Simulation results and discussions are given in Sect. 4. Finally, conclusions are inferred in Sect. 5. A new neural network stacked model based on tensor structure is proposed. Three neural network structures, including two type-2 fuzzy networks (TT2-ELM and TT2-RVFL) and an ordinary neural network (TROP-ELM) which can be considered as a type-1 fuzzy network, are the members of the proposed tensor-based fuzzy stacked neural network (TSFNN). They are fused into a 3-D tensors as members of the stacked neural network. The hidden layer of three member networks of the proposed TSFNN are stacked in a 3-D tensor. In this network stacking process, the mapping results of the three member networks are fused into the generated 3-D tensor or type-2 fuzzy set. The 3-D tensor combines the contents of one type-1 fuzzy set and two type-2 fuzzy sets, which can be considered as a new type-2 fuzzy set. Therefore, TSFNN is a new high-order fuzzy modeling method. The tensor regression is realized via a tensor unfolded algorithm. The 3-D tensor of the proposed model is unfolded into three matrices. The information of tensor space can be reduced into three matrix spaces. The matrix regression with Tikhonov regularization of three matrices can be used to solve the consequent problem. In this process, the problem of tensor space is solved in matrix space, and the Tikhnonv regularization method is used in the tensor regression process.

Preliminary

In this section, the tensor-based type-2 RVFL, tensor-based type-2 ELM and Tikhonov regularized OP-ELM are introduced.

Tensor-based type-2 RVFL

The RVFL usually adopts the activation functions to construct the network, for example the Radbas (y = ) functions. In the hidden layers of the network, where y and s are defined as the output and input, respectively. The enhancement nodes of tensor-based RVFL are replaced with IT2 fuzzy sets. The structure of tensor-based type-2 RVFL (TT2-RVFL) is represented in Fig. 1.

Fig. 1

Structure of the TT2-RVFL network

Structure of the TT2-RVFL network TT2-RVFL has three layers, input layer is the first, the second is hidden layer which includes fully connected part, that is the green nodes in Figs. 1 and 3, and enhancement part, the third is output layer. To the hidden layer, the enhancement part is extended via interval type-2 fuzzy set and the expanded enhancement part is stacked into a 3-D tensor structure. Activation function Radbas of RVFL is extended to interval type-2 fuzzy set IT2Radbas, and the extended RVFL is constructed using IT2Radbas.

Fig. 3

Structure of the TT2-ELM network

Figure 2 shows the standard formulation of the IT2 fuzzy set. IT2 fuzzy set is constituted via lower membership function (LMF) and upper membership function (UMF), the secondary membership function is a constant. In the light of the front view of the Fig. 2, the upper line represents UMF and lower one represents LMF. The area between LMF and UMF is the footprint of uncertainty (FOU) that includes the information of IT2 fuzzy set. The incentive function is named IT2Radbas, and the interval type-2 fuzzy set could be constructed with this type of activation functions.

Fig. 2

Structure of the IT2 fuzzy set

Structure of the IT2 fuzzy set The membership function (MF) of type-2 fuzzy sets in TT2-RVFL is defined as follows:where . Given testing data set , where , and = , and . For a lower MF matrix can be structured with the following matrices:where is bias and (; ) is input weights, respectively, and they are randomly generated. By the definition of the IT2 fuzzy sets’ lower MF, the relationship between input and expected output could be approximated by the lower MF matrix. The principal MF matrix and the upper MF matrix can also be structured similarly, with the following formulas:It also forms the upper membership functions. The UMFs are used to filling tensor. The slices of tensor are:where is a random generated weighted vector constructed by tensor.

Remark 1

The uncertain weight method (Runkler et al. 2018) is applied to compute the principal MF matrices , which reflects the impact of MF value on the whole defuzzification result of the set. The mapping results are obtained as follows:where measures the uncertainty of lower membership value on type reduction results through the formula . The uncertainty weight method expands the simple method to obtain the mean value of upper and lower MF values. The legible output can be given as follows:Formula (4) shows that for , the defuzzification results increase linearly with uncertainty; the defuzzification results are less than linear when ; the defuzzification results would be greater than linear when .

Remark 2

IT2 fuzzy set can be used to expand the enhancement node of RVFL. The original enhancement part of RVFL is constructed by type-1 MF or type-1 fuzzy set. In a word, RVFL can be considered as a type-1 fuzzy network. IT2 fuzzy set can extend RVFL via updating the enhancement node. In this process, type-1 fuzzy set is extended to IT2 fuzzy set. After the extension of IT2 fuzzy set, the enhancement part of RVFL has interval type-2 fuzzy structure. When formula (3) is used in the defuzzification step, which represents the above extension process of IT2 fuzzy set to RVFL. The new RVFL with the type-2 fuzzy set can be named as interval type-2 random vector function link (IT2-RVFL) network. Finally, a 3-tensor is established by the three foregoing membership functions , and . Thereinafter, because of the relevant usage of in tensor equations, will be changed to another capital letter in the next section. It can be known from the relevant content of the tensor equation that in the enhanced node of TT2-RVFL, the weighting matrix is , so the weighting matrix of TT2-RVFL can be defined as:To TT2-RVFL, the output model can be fused into one matrix by the following equation:where is called the equilibrium coefficient of TT2-RVFL, is denoted as mapping consequences of non-linear interval type-2 activation function, is the weighted matrices for the enhanced part, matrix ; define (i = ) as a weight vector from input layer to the intensification nodes are stochastically generated, in such way, the activation functions in , are not fully saturated; is the input matrix that is structured by input samples, and = [] is used to denote the unresolved input weight matrix. Structure of the TT2-ELM network

Remark 3

In the light of Eq. (6), when , TT2-RVFL will be degenerated to RVFL; when , TT2-RVFL is a mixed model of tensor-based extreme learning machine and RVFL; when , TT2-RVFL will be transformed to tensor-based extreme learning machine.

Tensor based type-2 ELM

The TT2-ELM was first proposed in Huang et al. (2019). The advantage of tensor structure is that the information of quadratic MF can be contained and modelled directly into one high dimensional array. This characteristic can avoid the type reduction operation in the process of type-2 fuzzy reasoning. Therefore, the tensor structure can seamlessly embed type-2 fuzzy sets into ELM. Figure 3 intuitively shows the structure of TT2-ELM. Obviously, TT2-ELM is a single hidden layer feed forward neural network. TT2-ELM has three layers, input layer is the first, hidden layer which is reformed via the triangular type-2 fuzzy sets (Huang et al. 2019) is the second layer, and the triangular type-2 fuzzy sets are used to form the 3-D tensor, output layer is the third layer. For the specified test dataset, which is expressed as , where = (, ) represents inputs, and represents outputs. The mathematical model of TT2-ELM is formed as:where can be reshaped by with a specified size , the is regression tensor’s dimension, N is the training patterns, is the value of output weight, is the output matrix. Obviously, when , Eq. (7) degenerates into matrices case. According to Huang et al. (2019) (Theorem 1), for multi-linear system (7), if there exist any , then the multi-linear system (7) is solvable, and the solution of the Eq. (7) is:where is a solution of . The resultant in formula (8) is the solution of TT2-ELM. For the Eq. (7), if cannot be obtained via the analysis, then the multi-linear system (7) is unsolvable, and the Eq. (7) has a minimum norm solution alternatively. The following tensor equation can be obtained.The gain tensor in Eq. (9) can be considered as ‘square tensor’, and the solution of tensor equation can still be obtained. Based of theory of tensor and kernel, the following equation can be determined.where is a random tensor that can satisfy formula (10). As long as this tensor has a suitable order, it can be established. A generally accepted condition is gratifies . This condition indicates that is included in the null space of . Therefore, a general solution can be obtained as follows:where represents the 1-inverse of . The minimum solution for the tensor equation consists of two parts, one of which comes from the null space of equation . Note that the tensor equation that is formulated by (7) and (11), we have:For Eq. (12), if , is the minimizer. In the light of Corollary 2.14(1) Behera and Mishra (2017), the existence of makes:Subsequently, the least squares solution of Eq. (13) is shown as below:Equation (14) is the minimum norm solution of Eq. (7). Implementation steps of TROP-ELM

Tikhonov regularized OP-ELM

In Miche et al. (2010), Yoan Miche et al. first proposed OP-ELM, it is an improvement of ELM. Figure 4 shows the Tikhonov regularized ELM (TROP-ELM) Miche et al. (2011). First, the OP-ELM network uses three different types of activation functions to form the kernel, and the assembled kernel is able to improve robustness and generality. For the initial condition of the ELM algorithm, Sigmoid kernel is used in its structure, while OP-ELM could use linear kernel, Sigmoid kernel and Gaussian kernel. Second, compared with the original proposed ELM, the multi-response sparse regression algorithm (MRSR) and the verification method leave one out (LOO) are also introduces in OP-ELM. The main role is to prune irrelevant variables by pruning the related neurons of SLFN constructed by ELM. The MRSR algorithm can rank neurons according to the usefulness of neurons, and the actual pruning technique is performed by leave one out validation results.

Fig. 4

Implementation steps of TROP-ELM

The structure of the proposed tensor-based stacked fuzzy neural networks (TSFNN)

Tensor based hybrid single fuzzy neural networks design

In the previous section, three single fuzzy neural networks, that are, the TT2-ELM, TT2-RVFL and TROP-ELM are briefed. In this section, the three networks are stacked into one network, and a regression method that is based on the three regression results are introduced. The architecture is shown in Fig. 5.

Fig. 5

The structure of the proposed tensor-based stacked fuzzy neural networks (TSFNN)

The framework of tensor-based stacked single fuzzy neural network that is design based on three single fuzzy neural networks (TT2-ELM, TT2-RVFL and TROP-ELM), while the three algorithms have been proposed by researchers. The layer 2 is the hidden layer that is constituted by the three algorithms to construct a tensor structure. The tensor structure could be constructed in layer 3, then the tensor is unfolded into three different matrices for each single network. The final regression uses a simple normalized scalar weighting, which is the part to be optimized in the future. The unfolding of tensor could use the definitions from (Yu et al. 2019), which are appended as Definition 1.

Definition 1

()-unfolding Consider a tensor with N-dimensional, and the tensor follows the dimension that is a k-partition of , where and . The (, )-unfolding of , denoted by , is a tensor with the size of such that the -th entry of is the entry of at the position (), where = [], and is the linear index of a multi-dimensional array, see, e.g., Baranyi (2016) and Baranyi et al. (2014). For example, = [] = [], = [], then is a 2-partition of , where = [] = [], = = 3.

Remark 4

By comparing the structures of TT2-RVFL, TT2-ELM and TROP-ELM, we have made a few modifications to TT2-RVFL and TT2-ELM. The original activation function of TT2-RVFL and TT2-ELM is Sigmoid function, while TROP-ELM uses linear activation function and Gaussian activation function in addition to Sigmoid function. In order to ensure that their structures correspondingly and facilitate the composition of tensor structures, linear activation function and Gaussian activation function are also added to TT2-RVFL and TT2-ELM. After adding two distinct activation functions, TT2-ELM and TT2-RVFL obtained better performance than the original proposed ones based on our test. The 3-tensor generates three matrices, and the row number of matrices is just equal to sample number. Three matrices are denoted by and , respectively, and , . The three matrices, and could reconstruct the tensor easily, that is, is the first aspect of A( : , : , 1), which is the mapping results from the TT2-RVFL. is the second aspect of A( : , : , 2), which is the mapping results from the TT2-ELM. is the three aspect of A( : , : , 3), which is the mapping results from the TROP-ELM. To the type-2 fuzzy networks, the LMF, UMF and defuzzification of the secondary membership function are used to solve the consequent parameters’ learning problem. To the regression layer, , and are three matrices that is unfolded from the tensor, regression equation can be denoted as , where , and . To make the network perform best, first, the error of each network is calculated, in addition, root mean square error is used as the measurement standard. Second, the best-performing network is found out, which is used to train the other two network errors. After processing, the network is recorded as . Finally, the output is averaged based on , that iswhich is the average results of the three type-2 fuzzy networks, the MF values are obtained from lower MF and upper MF, since the matrices are the unfolded results from the tensor. And the defuzzificaiton result could be calculated from secondary MFs, it could also be used for the stacked SFNN. The unconstraint regression result of could be denoted as . In statistics, this method is known as ridge regression, it is associated to the Levenberg–Marquardt algorithm and Andrey Tikhonov method to solve the regularization of ill-posed non-linear least-squares problems. Suppose that for a known matrix and vector , a vector is expected to be found, such thatIn most of the cases, ordinary least squares estimation leads to an overdetermined (over-fitted), or more often an underdetermined (under-fitted) system of equations. Therefore, in solving the ill-posed inverse-problem, the inverse mapping operators that has the undesirable tendency of amplifying noise (The eigenvalue is maximum in the reverse mapping and the singular value is minimum in the forward mapping). Moreover, every element of the reconstructed version of is implicitly nullified by ordinary least squares, instead of taking a model to the . For the purpose of the minimize residuals sum of squares, and the particular solution also satisfies some suitable qualities, a regularization term which can be added to this primary minimization problem, it can be succinctly scripted as follows:where is the Euclidean norm, is an appropriately selected Tikhonov weighting matrix. Under many circumstances, the matrix is selected as a multiple of the character matrix , By regularization, solutions with smaller norms can be found (Ng 2004). At other times, in the event that the fundamental vectors are considered primarily uninterrupted, a low-pass operator can be accustomed to strengthen flatness. This canonicalization enhances the conditioning of the problem, which leads to a straightforward numerical solution. An approximated solution is signified through , which is presented by:the individual algorithm of the stacked single fuzzy neural network could use the regularized results for the tensor unfolded structure’s learning method.

Simulation results for the datasets

In this section, the UCI benchmark dataset and the other four actual datasets are tested to evaluate the performance of this method. Among all the simulations, mean squared error (MSE), mean absolute error (MAE) and root mean square error (RMSE) were utilized to assess the performance of the proposed TSFNN and comparison methods. MAE, MSE and RMSE are defined as below:where is the predicted signal, is the target signal, and N is the length of the testing sequence. In order to test performance of the proposed TSFNN, ten different models are used for comparison. They are tensor-based type-2 random vector functional link (TT2-RVFL) network (Zhao and Wu 2019), tensor-based type-2 extreme learning machine (TT2-ELM) (Huang et al. 2019), optimally-pruned extreme learning machine (OP-ELM) (Miche et al. 2010), optimally-pruned extreme learning machine with Tikhonov regularization (TROP-ELM) (TROP-ELM) (Miche et al. 2011), interval type-2 fuzzy neural network (IT2-FNN) (Juang et al. 2010), evolving type-2 quantum fuzzy neural network (eT2QFNN) (Ashfahani et al. 2019), regularized extreme learning machine with biased drop-connect and biased dropout (BD-ELM) (Lai et al. 2020), recurrent neural network with Levenberg–Marquardt (LM) algorithm (RNN-LM) (Moré 1978), recurrent neural network with Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm (RNN-BFGS) (Head and Zerner 1985) and long short-term memory (LSTM) (Hochreiter and Schmidhuber 1997). It is noted that RNN-LM and RNN-BFGS are recurrent neural network (RNN) model with different training algorithms. RNN-LM and RNN-BFGS are implemented by Pyrenn Toolbox (Atabay 2016). The key parameters settings of comparison algorithms are listed below. The eleven algorithms uses the same datasets which have the same normalization process. 70% of each dataset is used for training and 30% is used for testing. The hidden nodes of all comparison algorithms are set as 20. The input weight and bias of each comparison algorithm are the same which randomly generated. For TT2-RVFL, the balance weighting factor between enhancement node part and hidden layer is set as 0.9. The Tikhonov regularization is used in matrix regression and regularization tuning parameter is set as 0.01. For TT2-ELM, Tikhonov regularization tuning parameter is set as 0.01. For OP-ELM, the parameters design is the same as the original paper (Miche et al. 2010). For TROP-ELM, the parameters is the same as OP-ELM and Tikhonov regularization tuning parameter is set as 0.01. For IT2-FNN, the parameters setting is similar in Juang et al. (2010) and Tikhonov regularization tuning parameter is set as 0.01. For eTQ2ELM, the learning rate is set as 0.01, and the construction of interval type-2 quantum fuzzy set is similar in Ashfahani et al. (2019). For BD-ELM, the parameters of the ELM part are the same as above. BD-ELM uses the biased dropconnect regularization and biased dropout regularization. The higher and lower group drop possibility of the biased dropconnect regularization (Cao et al. 2015) are set as 0.9 and 0.7, respectively. The parameter settings of biased dropout regularization (Poernomo and Kang 2018) are the same as biased dropconnect regularization. For the RNN-LM and RNN-BFGS, RNN structure is constructed via Pyrenn Toolbox (Poernomo and Kang 2018). RNN structure has 20 neurons and 5 hidden layers. The maximum number of iterations is set as 100. The stopping condition is set as , which is the residual between the original value and the predicted value. The LM algorithm and BFGS algorithm are invoked from Pyrenn Toolbox, directly. For LSTM, the learning rate is set as 0.01, the maximum number of iterations is set as 100, and the stopping condition is set as . For the proposed TSFNN, the Tikhonov regularization tuning parameter is set as 0.01. The parameters of its members is the same as the settings of TT2-RVFL, TT2-ELM and TROP-ELM. All experiments are performed on a computer with AMD Ryzen 7 4800U with Radeon Graphics 1.80 GHz and 16 GB RAM. The result of total 5000 times were performed on each data set. Attributes of the testing datasets The Abalone, Airfoil self noise, Auto-Mpg, Bank, Concrete slump, Diabetes, Delta aileron, Delta elevators, Energy efficiency and Wine quality white datasets could be download via the following URL: https://archive.ics.uci.edu/ml/datasets.php

Regression problems

In this section, ten realistic world regression problems are used for testing. Abalone is a dataset that is used to predict abalone’s age by physical measurements, which includes the whole weight, shucked weight and viscera weight of abalone in Tasmania, and 4447 samples with 9 attributes is included in the dataset. Airfoil Self-Noise dataset is obtained from a series of aerodynamic and acoustic tests on two-dimensional, and three-dimensional airfoil blade profiles in an anechoic wind tunnel, it contains 1503 samples and 6 attributes. The data set of Auto-MPG assembles miles each gallon data with dissimilar car brands, it contains 392 samples with 8 attributes. The bank dataset simulates the customer’s patience who select their favored services in the bank according to 8 factors, for example residential area, distance, virtual temperature regulating bank option and so on, it contains 8192 samples with 8 attributes. Concrete Slump dataset contains information about the factors that affect slump flow of concrete, it includes 103 samples with 11 attributes. Diabetes is a dataset that investigates the reliance of the grade of serum C-peptide on various factors, it can be used to measure residual insulin secretion patterns, 768 samples with 4 attributes are included in the dataset. Delta ailerons and Delta elevators are recorded ailerons’ data and elevators’ data for delta, and they have 7129 samples with 6 attributes, and 9247 with 7 attributes, respectively. Energy efficiency is a dataset that is obtained by energy analysis of 12 various architectural shapes, which is simulated in Ecotect, and there are 768 samples and 8 features in it, the regression problem is for forecast 2 authentic valued responses that are cooling load and heating load. Wine quality white is a dataset associated with red and white wine samples, it contains 4898 samples with 12 attributes. Comparison results with TSFNN, TT2-RVFL, TT2-ELM, OP-ELM, TROP-ELM, IT2-FNN, eT2QFNN, BD-ELM, RNN-LM, RNN-BFGS and LSTM. The bold parts represent the best performance of eleven algorithms on each dataset (a brief introduction is listed in Table 1)

Table 1

Attributes of the testing datasets

Dataset	#Attributes	#Train set	#Test set
Abalone	9	2923	1524
Airfoil self noise	6	1052	451
Auto-Mpg	7	274	118
Bank	9	5734	2458
Concrete slump	11	91	12
Diabetes	2	538	230
Delta aileron	6	1052	451
Delta elevators	7	6661	2856
Energy efficiency	9	537	231
Wine quality white	12	3429	1469
Electrical_detect	7	8400	3601
Electrical_No fault	6	1656	709
Electrical_LG fault	6	790	339
Electrical_LLG fault	6	794	340
Electrical_LLL fault	6	767	329
Electrical_LLLG fault	6	793	340
Asteroid	7	1750	750
Covid19_Beijing	3	345	149
Covid19_Shanghai	3	345	149
Covid19_Tianjin	3	345	149
Covid19_Chongqing	3	345	149
Covid19_Arizona	3	316	136
Covid19_Washington	3	325	140
Covid19_California	3	316	136
Covid19_Illinois	3	317	136

The Abalone, Airfoil self noise, Auto-Mpg, Bank, Concrete slump, Diabetes, Delta aileron, Delta elevators, Energy efficiency and Wine quality white datasets could be download via the following URL: https://archive.ics.uci.edu/ml/datasets.php

Comparison results with TSFNN, TT2-RVFL, TT2-ELM, OP-ELM, TROP-ELM, IT2-FNN, eT2QFNN, BD-ELM, RNN-LM, RNN-BFGS and LSTM. The bold parts represent the best performance of eleven algorithms on each dataset (a brief introduction is listed in Table 1) Mathematical expression of different regularization methods which are applied in TSFNN The information of the dataset is presented in Table 1, these datasets involve four small-scale datasets and six moderate-scale datasets. The mean and standard of 5000 experimental results of Abalone, Airfoil self-noise, Auto-Mpg, Bank, Concrete slump, Diabetes, Delta ailerons, Delta elevators, Energy efficiency and Wine quality white are showed in Tables 2 and 3. According to the comparison results, the top three algorithms with the descending order performance rank are TSFNN, TT2-RVFL and TT2-ELM on the whole. The most competitive algorithm of these three algorithms is RNN-LM. Performance of OP-ELM and TROP-ELM is at medium level. TT2-RVFL, TT2-ELM and TROP-ELM are the members of TSFNN, and they are all single-layer feedforward network. In the light of the excellent performance of TSFNN, the proposed TSFNN can be proved that can concentrate the strengths of its members. Meanwhile, the disadvantage of weaker members is offset. Under this stacked mechanism, the performance of TSFNN can be rivalled or surpassed to the RNN.

Table 2

Dataset	Method	Training (MAE)		Testing (MAE)		Training (MSE)		Testing (MSE)		Training (RMSE)		Testing (RMSE)
Dataset	Method	Mean	SD	Mean	SD	Mean	SD	Mean	SD	Mean	SD	Mean	SD
Abalone	TSFNN	1.51e+00	1.85e−02	4.72e+00	6.26e+01	4.38e+00	1.08e−01	8.97e+03	3.27e+05	2.09e+00	2.58e−02	7.09e+00	9.45e+01
	TT2-RVFL	1.52e+00	1.96e−02	1.54e+00	3.56e−02	4.42e+00	1.14e−01	4.66e+00	3.40e−01	2.10e+00	2.77e−02	2.16e+00	7.62e−02
	TT2-ELM	1.51e+00	1.94e−02	1.54e+00	3.57e−02	4.38e+00	1.13e−01	4.64e+00	3.70e−01	2.09e+00	2.70e−02	2.15e+00	8.23e−02
	OP-ELM	1.66e+00	4.11e−02	2.51e+00	9.58e+00	5.24e+00	2.56e−01	1.16e+05	3.22e+06	2.29e+00	5.56e−02	3.17e+01	3.39e+02
	TROP-ELM	1.66e+00	4.20e−02	3.74e+00	6.92e+01	5.25e+00	2.60e−01	6.02e+06	4.07e+08	2.29e+00	5.63e−02	7.55e+01	2.45e+03
	IT2-FNN	1.62e+00	3.47e−02	1.63e+00	4.63e−02	4.96e+00	2.11e−01	5.05e+00	3.59e−01	2.23e+00	4.71e−02	2.25e+00	7.93e−02
	eT2QFNN	1.51e+00	7.76e−02	1.58e+00	7.85e−02	4.61e+00	4.20e−01	5.11e+00	5.92e−01	2.15e+00	9.52e−02	2.26e+00	1.25e−01
	BD-ELM	1.66e+00	9.53e−02	1.66e+00	9.89e−02	5.21e+00	6.21e−01	5.29e+00	6.83e−01	2.28e+00	1.27e−01	2.30e+00	1.40e−01
	RNN-LM	1.51e+00	3.47e−02	1.53e+00	4.36e−02	4.39e+00	1.59e−01	4.58e+00	2.81e−01	2.09e+00	3.78e−02	2.14e+00	6.55e−02
	RNN-BFGS	1.66e+00	9.95e−02	1.67e+00	1.02e−01	5.14e+00	6.21e−01	5.21e+00	6.54e−01	2.26e+00	1.12e−01	2.28e+00	1.22e−01
	LSTM	3.67e+00	2.13e+00	4.40e+00	3.13e+00	3.78e+01	3.39e+01	3.49e+01	4.88e+01	5.53e+00	2.67e+00	5.03e+00	3.09e+00
Airfoil self noise	TSFNN	2.97e+00	8.96e−02	3.03e+00	1.30e−01	1.47e+01	8.73e−01	1.54e+01	1.33e+00	3.83e+00	1.13e−01	3.93e+00	1.69e−01
	TT2-RVFL	3.17e+00	1.46e−01	3.29e+00	1.76e−01	1.66e+01	1.38e+00	1.79e+01	1.86e+00	4.06e+00	1.79e−01	4.23e+00	2.18e−01
	TT2-ELM	3.06e+00	1.26e−01	3.19e+00	1.58e−01	1.55e+01	1.16e+00	1.69e+01	1.65e+00	3.93e+00	1.46e−01	4.10e+00	2.00e−01
	OP-ELM	3.92e+00	1.58e−01	3.96e+00	1.98e−01	2.50e+01	1.71e+00	2.55e+01	2.33e+00	5.00e+00	1.70e−01	5.05e+00	2.28e−01
	TROP-ELM	3.92e+00	1.63e−01	3.96e+00	1.98e−01	2.50e+01	1.75e+00	2.55e+01	2.31e+00	5.00e+00	1.74e−01	5.05e+00	2.27e−01
	IT2-FNN	3.69e+00	1.75e−01	3.72e+00	2.07e−01	2.22e+01	1.78e+00	2.26e+01	2.26e+00	4.70e+00	1.87e−01	4.75e+00	2.37e−01
	eT2QFNN	3.33e+00	2.85e−01	7.19e+00	3.32e+00	2.04e+01	1.22e+01	1.03e+02	1.72e+02	4.45e+00	7.76e−01	9.15e+00	4.43e+00
	BD-ELM	4.17e+00	5.89e−01	4.20e+00	5.97e−01	2.80e+01	7.28e+00	2.85e+01	7.43e+00	5.26e+00	6.45e−01	5.29e+00	6.56e−01
	RNN-LM	2.58e+00	1.95e−01	2.68e+00	2.08e−01	1.19e+01	1.58e+00	1.28e+01	1.81e+00	3.44e+00	2.29e−01	3.57e+00	2.53e−01
	RNN-BFGS	3.58e+00	9.70e−02	3.62e+00	1.40e−01	2.11e+01	1.10e+00	2.16e+01	1.68e+00	4.60e+00	1.19e−01	4.64e+00	1.81e−01
	LSTM	2.94e+00	2.14e−01	3.21e+00	4.00e−01	1.47e+01	2.59e+00	1.75e+01	7.02e+00	3.82e+00	2.60e−01	4.16e+00	4.51e−01
AutoMPG	TSFNN	2.57e−01	1.88e−02	2.71e−01	2.86e−02	1.25e−01	1.38e−02	1.34e−01	2.46e−02	3.53e−01	1.95e−02	3.65e−01	3.37e−02
	TT2-RVFL	2.87e−01	2.56e−02	3.34e−01	3.59e−02	1.56e−01	2.36e−02	2.21e−01	5.13e−02	3.92e−01	3.14e−02	4.67e−01	5.28e−02
	TT2-ELM	2.66e−01	2.45e−02	3.17e−01	3.53e−02	1.37e−01	2.04e−02	2.05e−01	4.78e−02	3.69e−01	2.73e−02	4.50e−01	5.15e−02
	OP-ELM	3.70e−01	2.77e−02	3.85e−01	3.74e−02	2.30e−01	2.73e−02	2.70e−01	9.01e−01	4.79e−01	2.83e−02	5.02e−01	1.33e−01
	TROP-ELM	3.70e−01	2.77e−02	3.86e−01	6.86e−02	2.30e−01	2.64e−02	6.67e−01	2.51e+01	4.79e−01	2.73e−02	5.11e−01	6.37e−01
	IT2-FNN	4.16e−01	4.08e−02	4.32e−01	4.71e−02	2.85e−01	5.23e−02	3.07e−01	6.46e−02	5.31e−01	4.71e−02	5.52e−01	5.65e−02
	eT2QFNN	4.16e−01	2.74e−02	4.59e−01	1.73e−01	3.44e−01	4.03e−02	3.91e−01	4.54e−01	5.86e−01	3.30e−02	5.94e−01	1.96e−01
	BD-ELM	5.08e−01	1.77e−01	5.21e−01	1.79e−01	4.48e−01	3.74e−01	4.70e−01	3.82e−01	6.37e−01	2.06e−01	6.53e−01	2.08e−01
	RNN-LM	1.52e−01	3.57e−02	2.07e−01	3.77e−02	6.40e−02	2.18e−02	1.34e−01	3.97e−02	2.49e−01	4.32e−02	3.62e−01	5.33e−02
	RNN-BFGS	2.83e−01	2.43e−02	3.02e−01	2.98e−02	1.60e−01	1.79e−02	1.86e−01	3.62e−02	4.00e−01	2.22e−02	4.29e−01	4.16e−02
	LSTM	2.26e−01	3.97e−02	2.87e−01	5.81e−02	1.15e−01	6.57e−02	1.97e−01	1.40e−01	3.35e−01	4.84e−02	4.38e−01	7.30e−02
Bank	TSFNN	2.00e−02	6.06e−04	2.01e−02	6.96e−04	7.92e−04	4.71e−05	7.97e−04	6.20e−05	2.81e−02	8.46e−04	2.82e−02	1.09e−03
	TT2-RVFL	3.20e−02	2.83e−03	3.23e−02	2.89e−03	2.05e−03	3.53e−04	2.10e−03	3.70e−04	4.65e−02	4.22e−03	4.53e−02	3.96e−03
	TT2-ELM	3.05e−02	3.08e−03	3.09e−02	3.15e−03	1.84e−03	3.62e−04	1.90e−03	3.79e−04	4.27e−02	4.21e−03	4.33e−02	4.35e−03
	OP-ELM	2.20e−02	1.70e−03	2.21e−02	1.70e−03	1.00e−03	2.20e−04	1.01e−03	2.20e−04	3.15e−02	2.46e−03	3.16e−02	2.53e−03
	TROP-ELM	2.20e−02	1.69e−03	2.21e−02	1.70e−03	1.00e−03	2.18e−04	1.01e−03	2.26e−04	3.15e−02	2.44e−03	3.16e−02	2.58e−03
	IT2-FNN	3.02e−02	3.77e−03	3.03e−02	3.79e−03	1.88e−03	4.87e−04	1.89e−03	4.93e−04	4.30e−02	5.51e−03	4.31e−02	5.58e−03
	eT2QFNN	2.26e−02	1.35e−03	2.13e−02	2.37e−03	1.11e−03	1.03e−04	8.40e−04	1.66e−04	3.32e−02	1.53e−03	2.89e−02	2.63e−03
	BD-ELM	5.14e−02	1.36e−02	5.14e−02	1.36e−02	4.99e−03	2.65e−03	5.00e−03	2.67e−03	6.90e−02	1.49e−02	6.91e−02	1.50e−02
	RNN-LM	1.21e−02	1.33e−03	1.24e−02	1.34e−03	3.59e−04	1.24e−04	3.86e−04	1.27e−04	1.89e−02	1.66e−03	1.96e−02	1.71e−03
	RNN-BFGS	2.21e−02	3.40e−03	2.22e−02	3.42e−03	1.02e−03	2.62e−04	1.03e−03	2.68e−04	3.17e−02	3.77e−03	3.19e−02	3.85e−03
	LSTM	2.37e−01	2.21e−02	1.79e−01	1.04e−01	8.04e−02	8.06e−03	5.15e−02	5.74e−02	2.83e−01	1.47e−02	2.02e−01	1.05e−01
Concrete slump	TSFNN	9.76e−01	1.41e−01	1.15e+00	1.97e−01	1.59e+00	4.48e−01	2.23e+00	8.47e−01	1.25e+00	1.78e−01	1.47e+00	2.48e−01
	TT2-RVFL	1.84e+00	3.42e−01	3.73e+00	7.93e−01	5.70e+00	2.20e+00	2.67e+01	1.29e+01	2.41e+00	4.79e−01	4.97e+00	1.13e+00
	TT2-ELM	1.48e+00	3.17e−01	3.50e+00	8.17e−01	3.63e+00	1.60e+00	2.33e+01	1.22e+01	1.86e+00	4.00e−01	4.68e+00	1.16e+00
	OP-ELM	1.78e+00	1.56e−01	2.23e+00	3.51e−01	5.15e+00	8.63e−01	8.81e+00	3.18e+01	2.26e+00	1.89e−01	2.86e+00	7.80e−01
	TROP-ELM	1.77e+00	1.55e−01	2.22e+00	3.33e−01	5.13e+00	8.49e−01	8.46e+00	1.39e+01	2.26e+00	1.87e−01	2.85e+00	5.87e−01
	IT2-FNN	2.49e+00	5.30e−01	2.88e+00	6.89e−01	1.07e+01	4.77e+00	1.44e+01	7.31e+00	3.20e+00	6.65e−01	3.69e+00	8.76e−01
	eT2QFNN	3.17e+00	1.66e−02	3.68e+00	1.18e−01	2.09e+01	1.13e−01	1.99e+01	1.40e+00	4.57e+00	1.23e−02	4.46e+00	1.59e−01
	BD-ELM	4.12e+00	1.25e+00	4.44e+00	1.34e+00	2.92e+01	1.59e+01	3.35e+01	1.85e+01	5.18e+00	1.52e+00	5.56e+00	1.61e+00
	RNN-LM	2.41e−01	1.93e−01	1.02e+00	5.80e−01	1.56e−01	2.56e−01	2.50e+00	2.92e+00	3.09e−01	2.46e−01	1.37e+00	7.82e−01
	RNN-BFGS	1.71e+00	2.14e−01	2.21e+00	3.78e−01	4.75e+00	1.29e+00	8.21e+00	3.08e+00	2.16e+00	2.75e−01	2.82e+00	5.04e−01
	LSTM	1.04e+00	5.20e−01	2.29e+00	7.09e−01	2.23e+00	4.35e+00	9.48e+00	7.04e+00	1.33e+00	6.84e−01	2.94e+00	9.03e−01

Table 3

Dataset	Method	Training (MAE)		Testing (MAE)		Training (MSE)		Testing (MSE)		Training (RMSE)		Testing (RMSE)
Dataset	Method	Mean	SD	Mean	SD	Mean	SD	Mean	SD	Mean	SD	Mean	SD
Delta aileron	TSFNN	2.85e−03	4.10e−05	2.88e−03	5.33e−05	1.28e−05	3.08e−07	1.30e−05	4.39e−07	3.58e−03	4.32e−05	3.61e−03	6.08e−05
	TT2-RVFL	2.88e−03	4.22e−05	2.90e−03	5.46e−05	1.30e−05	3.26e−07	1.33e−05	4.62e−07	3.60e−03	4.75e−05	3.65e−03	6.33e−05
	TT2-ELM	2.85e−03	4.68e−05	2.88e−03	5.78e−05	1.28e−05	3.53e−07	1.31e−05	4.89e−07	3.58e−03	4.93e−05	3.62e−03	6.75e−05
	OP-ELM	3.14e−03	2.27e−04	3.15e−03	2.31e−04	1.55e−05	2.26e−06	1.57e−05	4.98e−06	3.93e−03	2.61e−04	3.95e−03	3.61e−04
	TROP-ELM	3.14e−03	2.23e−04	3.14e−03	2.28e−04	1.55e−05	2.23e−06	1.56e−05	2.40e−06	3.92e−03	2.56e−04	3.94e−03	2.72e−04
	IT2-FNN	3.08e−03	3.74e−05	3.08e−03	5.20e−05	1.48e−05	3.53e−07	1.48e−05	5.16e−07	3.84e−03	4.57e−05	3.85e−03	6.69e−05
	eT2QFNN	3.12e−03	5.47e−05	2.91e−03	7.33e−05	1.56e−05	6.27e−07	1.35e−05	7.39e−07	3.95e−03	7.93e−05	3.68e−03	9.74e−05
	BD-ELM	3.50e−03	6.79e−04	3.50e−03	6.79e−04	1.89e−05	6.57e−06	1.90e−05	6.57e−06	4.29e−03	6.86e−04	4.30e−03	6.86e−04
	RNN-LM	2.37e−03	9.92e−05	2.39e−03	1.03e−04	9.90e−06	5.95e−07	1.01e−05	6.59e−07	3.15e−03	9.12e−05	3.17e−03	1.01e−04
	RNN-BFGS	3.02e−03	7.77e−05	3.03e−03	8.74e−05	1.43e−05	7.83e−07	1.43e−05	8.81e−07	3.78e−03	9.56e−05	3.79e−03	1.08e−04
	LSTM	4.75e−03	1.05e−03	1.55e−02	1.18e−02	4.87e−05	1.86e−05	3.90e−04	6.03e−04	6.82e−03	1.46e−03	1.60e−02	1.15e−02
Delta elevtors	TSFNN	2.72e+00	2.35e−02	2.73e+00	3.77e−02	1.18e+01	1.81e−01	1.19e+01	3.09e−01	3.44e+00	2.63e−02	3.45e+00	4.48e−02
	TT2-RVFL	2.75e+00	2.92e−02	2.77e+00	4.22e−02	1.21e+01	2.29e−01	1.23e+01	3.47e−01	3.47e+00	3.45e−02	3.50e+00	4.96e−02
	TT2-ELM	2.73e+00	2.94e−02	2.75e+00	4.23e−02	1.20e+01	2.26e−01	1.22e+01	3.44e−01	3.46e+00	3.26e−02	3.49e+00	4.94e−02
	OP-ELM	2.85e+00	3.95e−02	2.85e+00	5.04e−02	1.29e+01	3.52e−01	1.30e+01	4.53e−01	3.59e+00	4.74e−02	3.60e+00	6.17e−02
	TROP-ELM	2.85e+00	3.68e−02	2.85e+00	4.92e−02	1.29e+01	3.26e−01	1.30e+01	4.36e−01	3.59e+00	4.42e−02	3.60e+00	5.96e−02
	IT2-FNN	2.85e+00	2.92e−02	2.85e+00	4.26e−02	1.29e+01	2.70e−01	1.29e+01	3.83e−01	3.59e+00	3.70e−02	3.60e+00	5.29e−02
	eT2QFNN	2.81e+00	3.56e−02	2.94e+00	4.36e−02	1.28e+01	3.37e−01	1.37e+01	3.74e−01	3.57e+00	4.69e−02	3.70e+00	4.96e−02
	BD-ELM	3.78e+00	3.46e−01	3.78e+00	3.48e−01	2.14e+01	2.99e+00	2.14e+01	3.02e+00	4.61e+00	3.38e−01	4.61e+00	3.41e−01
	RNN-LM	2.52e+00	3.08e−02	2.59e+00	6.07e−02	1.06e+01	2.67e−01	1.12e+01	4.95e−01	3.25e+00	4.11e−02	3.35e+00	7.38e−02
	RNN-BFGS	2.82e+00	1.59e−02	2.87e+00	2.95e−02	1.27e+01	9.62e−02	1.31e+01	1.35e−01	3.56e+00	1.35e−02	3.62e+00	1.85e−02
	LSTM	–	–	–	–	–	–	–	–	–	–	–	–
Diabetes	TSFNN	1.01e−01	3.18e−03	1.56e−01	1.22e+00	2.00e−02	1.15e−03	6.59e+00	2.44e+02	1.41e−01	4.07e−03	2.50e−01	2.55e+00
	TT2-RVFL	1.04e−01	3.41e−03	1.13e−01	5.85e−03	2.08e−02	1.23e−03	2.47e−02	2.95e−03	1.44e−01	4.36e−03	1.57e−01	9.36e−03
	TT2-ELM	1.02e−01	3.41e−03	1.13e−01	5.94e−03	2.02e−02	1.22e−03	2.49e−02	3.04e−03	1.42e−01	4.30e−03	1.58e−01	9.60e−03
	OP-ELM	1.11e−01	3.75e−03	2.09e−01	2.94e+00	2.39e−02	1.47e−03	2.00e+03	1.11e+05	1.55e−01	4.76e−03	1.61e+00	4.47e+01
	TROP-ELM	1.11e−01	3.74e−03	1.46e−01	8.91e−01	2.39e−02	1.47e−03	1.83e+02	9.05e+03	1.55e−01	4.76e−03	6.40e−01	1.35e+01
	IT2-FNN	1.13e−01	4.75e−03	1.16e−01	6.52e−03	2.44e−02	1.66e−03	2.55e−02	3.34e−03	1.56e−01	5.30e−03	1.59e−01	1.04e−02
	eT2QFNN	1.40e−01	1.40e−02	1.25e−01	2.99e−02	3.69e−02	7.90e−03	2.81e−02	2.37e−02	1.91e−01	1.96e−02	1.64e−01	3.37e−02
	BD-ELM	1.21e−01	1.37e−02	1.23e−01	1.40e−02	2.67e−02	4.56e−03	2.76e−02	5.22e−03	1.63e−01	1.30e−02	1.65e−01	1.51e−02
	RNN-LM	9.65e−02	4.70e−03	1.12e−01	6.82e−03	1.82e−02	1.55e−03	2.50e−02	3.31e−03	1.35e−01	5.79e−03	1.58e−01	1.04e−02
	RNN-BFGS	1.08e−01	3.73e−03	1.11e−01	5.74e−03	2.27e−02	1.31e−03	2.41e−02	3.01e−03	1.51e−01	4.36e−03	1.55e−01	9.69e−03
	LSTM	1.03e−01	5.13e−02	1.17e−01	2.58e−02	2.28e−02	3.80e−02	2.71e−02	1.21e−02	1.42e−01	5.23e−02	1.63e−01	2.51e−02
Energy efficiency	TSFNN	1.71e+00	7.49e−02	1.71e+00	1.13e−01	5.44e+00	5.30e−01	5.63e+00	7.02e−01	2.33e+00	1.15e−01	2.37e+00	1.48e−01
	TT2-RVFL	2.28e+00	2.24e−01	2.48e+00	2.66e−01	9.32e+00	1.69e+00	1.11e+01	2.28e+00	3.09e+00	2.92e−01	3.30e+00	3.28e−01
	TT2-ELM	2.09e+00	1.87e−01	2.31e+00	2.25e−01	7.91e+00	1.25e+00	9.69e+00	1.78e+00	2.80e+00	2.17e−01	3.10e+00	2.79e−01
	OP-ELM	2.07e+00	8.56e−02	2.12e+00	1.30e−01	8.16e+00	5.50e−01	8.57e+00	9.77e−01	2.86e+00	9.48e−02	2.92e+00	1.64e−01
	TROP-ELM	2.07e+00	8.21e−02	2.12e+00	1.29e−01	8.15e+00	5.19e−01	8.56e+00	9.40e−01	2.85e+00	9.02e−02	2.92e+00	1.60e−01
	IT2-FNN	2.25e+00	2.26e−01	2.29e+00	2.53e−01	9.57e+00	1.70e+00	9.93e+00	2.00e+00	3.08e+00	2.55e−01	3.14e+00	2.97e−01
	eT2QFNN	2.50e+00	2.30e−01	3.70e+00	1.86e+00	1.68e+01	3.86e+00	2.56e+01	3.64e+01	4.07e+00	4.41e−01	4.58e+00	2.15e+00
	BD-ELM	5.16e+00	2.16e+00	5.17e+00	2.16e+00	4.55e+01	3.11e+01	4.57e+01	3.11e+01	6.35e+00	2.28e+00	6.37e+00	2.27e+00
	RNN-LM	1.20e+00	3.12e−01	1.29e+00	3.38e−01	2.84e+00	1.39e+00	3.28e+00	1.63e+00	1.62e+00	4.55e−01	1.74e+00	4.92e−01
	RNN-BFGS	1.95e+00	9.37e−02	2.01e+00	1.35e−01	7.62e+00	5.12e−01	8.10e+00	9.07e−01	2.76e+00	9.19e−02	2.84e+00	1.59e−01
	LSTM	2.03e+00	5.24e−01	2.53e+00	7.85e−01	7.88e+00	4.70e+00	1.16e+01	7.17e+00	2.72e+00	6.97e−01	3.29e+00	8.56e−01
Wine quality white	TSFNN	4.81e−01	7.80e−03	4.90e−01	3.95e−01	3.81e−01	1.15e−02	1.23e+00	5.05e+01	6.17e−01	9.30e−03	6.36e−01	9.07e−01
	TT2-RVFL	4.93e−01	9.20e−03	5.13e−01	1.69e−02	3.96e−01	1.35e−02	4.34e−01	2.97e−02	6.29e−01	1.11e−02	6.58e−01	2.25e−02
	TT2-ELM	4.90e−01	9.54e−03	5.13e−01	1.73e−02	3.91e−01	1.38e−02	4.35e−01	3.14e−02	6.25e−01	1.10e−02	6.59e−01	2.37e−02
	OP-ELM	5.01e−01	7.80e−03	5.18e−01	3.06e−01	4.12e−01	1.18e−02	4.52e+01	1.92e+03	6.42e−01	9.18e−03	8.41e−01	6.67e+00
	TROP-ELM	5.01e−01	7.87e−03	5.10e−01	2.13e−02	4.12e−01	1.19e−02	5.16e−01	3.53e+00	6.42e−01	9.30e−03	6.64e−01	2.75e−01
	IT2-FNN	5.23e−01	1.73e−02	5.29e−01	2.24e−02	4.45e−01	2.41e−02	4.56e−01	3.65e−02	6.67e−01	1.79e−02	6.75e−01	2.68e−02
	eT2QFNN	5.53e−01	2.79e−02	5.41e−01	5.24e−02	5.28e−01	6.95e−02	5.05e−01	1.11e−01	7.25e−01	4.62e−02	7.08e−01	6.51e−02
	BD-ELM	5.47e−01	3.64e−02	5.51e−01	3.92e−02	4.79e−01	5.07e−02	4.88e−01	5.78e−02	6.91e−01	3.59e−02	6.97e−01	4.07e−02
	RNN-LM	4.65e−01	1.19e−02	5.03e−01	1.71e−02	3.55e−01	1.59e−02	4.23e−01	3.10e−02	5.96e−01	1.33e−02	6.50e−01	2.36e−02
	RNN-BFGS	5.01e−01	9.59e−03	5.11e−01	1.76e−02	4.15e−01	1.44e−02	4.33e−01	2.99e−02	6.44e−01	1.11e−02	6.58e−01	2.27e−02
	LSTM	4.86e−01	1.25e−02	5.09e−01	1.87e−02	3.85e−01	1.72e−02	4.27e−01	3.11e−02	6.20e−01	1.33e−02	6.53e−01	2.36e−02

Comparison with different regularization methods

In this paper, the Tikhonov regularization is used to optimize the performance of proposed TSFNN. The regularization method can be used to alleviate the over-fitting problem. In this section, three classical regularization methods are used to verify the effectiveness of the Tikhonov regularization in the proposed TSFNN. These regularization methods are regularization, regularization and dropout regularization (Srivastava et al. 2014). Their mathematical expressions are shown in Table 4. Four regularization methods are applied in the proposed TSFNN via Eq. (17), respectively.

Table 4

Mathematical expression of different regularization methods which are applied in TSFNN

ID	Regularization	Formulation	Solution	Parameter
1	classical \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_1$$\end{document}L1	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Vert \varvec{A}_k\varvec{\beta _k} -\varvec{y}_t \Vert ^{2}+\Vert \varvec{\beta _k} \Vert ^{1}$$\end{document}‖Akβk-yt‖2+‖βk‖1	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\beta _k}}=(\varvec{A}_k^{T}\varvec{A}_k+0.5)^{-1}\varvec{A}_k^{T}\varvec{y}_t$$\end{document}βk^=(AkTAk+0.5)-1AkTyt	–
2	classical \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_2$$\end{document}L2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Vert \varvec{A}_k\varvec{\beta _k} -\varvec{y}_t \Vert ^{2}+\Vert \varvec{\beta _k} \Vert ^{2}$$\end{document}‖Akβk-yt‖2+‖βk‖2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\beta _k}}=(\varvec{A}_k^{T}\varvec{A}_k+I )^{-1}\varvec{A}_k^{T}\varvec{y}_t$$\end{document}βk^=(AkTAk+I)-1AkTyt	I: Dentity matrix
3	Tikhonov	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Vert \varvec{A}_k\varvec{\beta _k} -\varvec{y}_t \Vert ^{2}+\Vert \Gamma _k \varvec{\beta _k} \Vert ^{2}$$\end{document}‖Akβk-yt‖2+‖Γkβk‖2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\beta _k}}=(\varvec{A}_k^{T}\varvec{A}_k+\Gamma ^{T}\Gamma )^{-1}\varvec{A}_k^{T}\varvec{y}_t$$\end{document}βk^=(AkTAk+ΓTΓ)-1AkTyt	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Gamma $$\end{document}Γ: Tikhonov matrix
4	Dropout	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Vert R*\varvec{(}A_k)\varvec{\beta _k} -\varvec{y}_t\Vert ^{2}$$\end{document}‖R∗(Ak)βk-yt‖2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\beta _k}}=\big ((R\varvec{A}_k)^{T}(R\varvec{A}_k)\big )^{-1} (R*\varvec{A}_k)^{T}\varvec{y}_t$$\end{document}βk^=((R∗Ak)T(R∗Ak))-1(R∗Ak)Tyt	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R\sim Bernoulli (p)$$\end{document}R∼Bernoulli(p)

Comparison results for TSFNN with different regularization methods. The bold parts represent the best performance of eleven algorithms on each dataset (a brief introduction is listed in Table 1) Four variants of TSFNN can be constructed via four different regularization methods. For classical regularization and classical regularization, their regularization parameter is set as 1. For dropout regularization, the matrix R satisfies the condition of and the probability p is set as 0.5 in this work. Among the comparisons, the classical regularization, classical regularization and dropout regularization are applied by the suggested tensor-based stacked fuzzy networks model to create variants of original TSFNN. Table 5 expresses comparison results for TSFNN with different regularization methods. On the whole, the descending order performance rank of variants of TSFNN with four regularization methods is regularization, Tikhonov regularization, regularization and dropout regularization. In the training and testing stage, TSFNN with regularization obtains the smallest error in four regularization methods in ten different scales datasets. But the stability of TSFNN with Tikhonov regularization outperforms the TSFNN with regularization. Taken together, the Tikhonov regularization can effectively deal with the uncertain information in the suggested model, and suppress over-fitting risk.

Table 5

Comparison results for TSFNN with different regularization methods. The bold parts represent the best performance of eleven algorithms on each dataset (a brief introduction is listed in Table 1)

Dataset	Method	Training (MAE)		Testing (MAE)		Training (MSE)		Testing (MSE)		Training (RMSE)		Testing (RMSE)
Dataset	Method	Mean	SD	Mean	SD	Mean	SD	Mean	SD	Mean	SD	Mean	SD
Abalone	TSFNN-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_1$$\end{document}L1	1.48e+00	1.83e−02	1.71e+00	5.65e+00	4.23e+00	1.08e−01	7.89e+01	3.78e+03	2.06e+00	2.63e−02	2.81e+00	8.43e+00
	TSFNN-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_2$$\end{document}L2	1.54e+00	1.89e−02	3.84e+00	4.13e+01	4.54e+00	1.11e−01	4.50e+03	1.91e+05	2.13e+00	2.61e−02	5.77e+00	6.68e+01
	TSFNN-Tikhonov	1.52e+00	1.85e−02	4.14e+00	3.29e+01	4.43e+00	1.10e−01	2.51e+03	5.45e+04	2.10e+00	2.60e−02	6.12e+00	4.97e+01
	TSFNN-Dropout	1.67e+00	2.24e−01	4.45e+00	4.32e+01	2.97e+01	1.00e+03	6.36e+04	3.53e+06	2.52e+00	4.84e+00	1.43e+01	2.52e+02
Airfoil self noise	TSFNN-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_1$$\end{document}L1	2.74e+00	1.01e−01	2.92e+00	2.74e−01	1.26e+01	8.75e−01	1.45e+01	7.18e+00	3.55e+00	1.23e−01	3.78e+00	4.83e−01
	TSFNN-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_2$$\end{document}L2	3.22e+00	9.99e−02	3.11e+00	1.33e−01	1.75e+01	9.98e−01	1.64e+01	1.39e+00	4.18e+00	1.19e−01	4.04e+00	1.71e−01
	TSFNN-Tikhonov	3.03e+00	9.04e−02	3.02e+00	1.29e−01	1.55e+01	8.86e−01	1.54e+01	1.33e+00	3.93e+00	1.12e−01	3.92e+00	1.68e−01
	TSFNN-Dropout	3.61e+00	2.60e+00	5.07e+00	1.78e+00	2.02e+03	9.87e+04	3.07e+02	5.64e+03	6.29e+00	4.45e+01	8.03e+00	1.56e+01
AutoMPG	TSFNN-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_1$$\end{document}L1	2.17e−01	1.58e−02	2.25e−01	4.65e−02	9.18e−02	1.10e−02	1.48e−01	2.70e+00	3.03e−01	1.82e−02	3.14e−01	2.22e−01
	TSFNN-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_2$$\end{document}L2	3.11e−01	2.41e−02	2.88e−01	3.01e−02	1.67e−01	1.84e−02	1.45e−01	2.59e−02	4.09e−01	2.25e−02	3.80e−01	3.42e−02
	TSFNN-Tikhonov	2.70e−01	1.95e−02	2.71e−01	2.82e−02	1.35e−01	1.42e−02	1.34e−01	2.43e−02	3.67e−01	1.94e−02	3.65e−01	3.33e−02
	TSFNN-Dropout	4.00e−01	4.92e−01	9.65e−01	2.07e+00	2.71e+01	1.25e+03	2.33e+02	1.07e+04	7.85e−01	5.15e+00	1.76e+00	1.52e+01
Bank	TSFNN-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_1$$\end{document}L1	1.98e−02	6.25e−04	2.06e−02	1.05e−03	7.75e−04	4.78e−05	9.46e−04	6.71e−03	2.78e−02	8.69e−04	2.91e−02	1.01e−02
	TSFNN-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_2$$\end{document}L2	2.04e−02	5.25e−04	2.02e−02	6.50e−04	8.22e−04	4.24e−05	8.06e−04	5.74e−05	2.87e−02	7.47e−04	2.84e−02	1.02e−03
	TSFNN-Tikhonov	2.01e−02	5.73e−04	2.01e−02	6.95e−04	8.00e−04	4.51e−05	7.98e−04	6.39e−05	2.83e−02	8.06e−04	2.82e−02	1.10e−03
	TSFNN-Dropout	2.30e−02	1.55e−02	6.29e−02	3.54e−02	1.19e+00	8.39e+01	1.35e−02	1.33e−01	5.84e−02	1.09e+00	8.10e−02	8.35e−02
Concrete slump	TSFNN-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_1$$\end{document}L1	2.87e+01	3.87e+01	9.55e+01	1.61e+02	2.32e+03	1.25e+04	3.53e+04	1.64e+05	2.87e+01	3.87e+01	9.62e+01	1.61e+02
	TSFNN-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_2$$\end{document}L2	1.45e+00	1.22e−01	1.16e+00	1.74e−01	3.40e+00	5.42e−01	2.18e+00	6.38e−01	1.84e+00	1.47e−01	1.46e+00	2.14e−01
	TSFNN-Tikhonov	1.12e+00	1.29e−01	1.15e+00	1.89e−01	2.08e+00	4.64e−01	2.23e+00	7.32e−01	1.43e+00	1.61e−01	1.47e+00	2.36e−01
	TSFNN-Dropout	1.75e+01	3.34e+01	1.74e+01	1.80e+01	5.42e+03	1.57e+05	1.19e+03	2.00e+04	2.41e+01	6.96e+01	2.10e+01	2.73e+01
Delta aileron	TSFNN-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_1$$\end{document}L1	2.79e−03	4.67e−05	2.80e−03	8.90e−05	1.23e−05	3.39e−07	1.48e−05	1.01e−04	3.51e−03	4.83e−05	3.60e−03	1.36e−03
	TSFNN-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_2$$\end{document}L2	2.93e−03	3.74e−05	2.91e−03	5.40e−05	1.35e−05	3.10e−07	1.32e−05	4.71e−07	3.67e−03	4.23e−05	3.64e−03	6.42e−05
	TSFNN-Tikhonov	2.87e−03	3.86e−05	2.88e−03	5.47e−05	1.30e−05	3.01e−07	1.30e−05	4.92e−07	3.60e−03	4.19e−05	3.61e−03	6.68e−05
	TSFNN-Dropout	3.04e−03	2.14e−04	3.61e−03	6.84e−04	1.05e−04	2.74e−03	4.43e−05	5.05e−04	4.49e−03	9.19e−03	4.97e−03	4.43e−03
Delta elevtors	TSFNN-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_1$$\end{document}L1	2.69e+00	2.41e−02	2.70e+00	4.70e−02	1.17e+01	1.82e−01	1.19e+01	6.15e+00	3.42e+00	2.66e−02	3.44e+00	3.24e−01
	TSFNN-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_2$$\end{document}L2	2.76e+00	2.25e−02	2.74e+00	3.77e−02	1.21e+01	1.77e−01	1.20e+01	3.10e−01	3.48e+00	2.54e−02	3.46e+00	4.48e−02
	TSFNN-Tikhonov	2.73e+00	2.27e−02	2.73e+00	3.80e−02	1.19e+01	1.77e−01	1.19e+01	3.11e−01	3.45e+00	2.57e−02	3.45e+00	4.51e−02
	TSFNN-Dropout	2.83e+00	3.68e−02	3.32e+00	4.02e−01	1.69e+01	1.37e+02	3.27e+01	4.59e+02	3.69e+00	1.83e+00	4.40e+00	3.65e+00
Diabetes	TSFNN-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_1$$\end{document}L1	9.76e−02	3.16e−03	1.13e−01	3.89e−01	1.86e−02	1.10e−03	5.72e−01	1.94e+01	1.36e−01	4.05e−03	1.97e−01	7.30e−01
	TSFNN-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_2$$\end{document}L2	1.05e−01	3.27e−03	1.21e−01	6.30e−01	2.16e−02	1.21e−03	1.50e+00	9.13e+01	1.47e−01	4.13e−03	1.78e−01	1.21e+00
	TSFNN-Tikhonov	1.02e−01	3.21e−03	2.15e−01	5.32e+00	2.05e−02	1.16e−03	1.06e+02	6.84e+03	1.43e−01	4.08e−03	3.62e−01	1.03e+01
	TSFNN-Dropout	1.14e−01	5.50e−02	2.71e−01	3.76e+00	9.59e−01	4.82e+01	1.21e+02	7.12e+03	2.38e−01	9.50e−01	6.78e−01	1.10e+01
Energy efficiency1	TSFNN-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_1$$\end{document}L1	1.62e+00	8.35e−02	1.78e+00	3.22e−01	4.67e+00	5.58e−01	5.40e+00	9.08e+00	2.16e+00	1.31e−01	2.29e+00	4.22e−01
	TSFNN-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_2$$\end{document}L2	1.80e+00	6.83e−02	1.73e+00	1.14e−01	6.52e+00	4.10e−01	5.93e+00	6.88e−01	2.55e+00	8.05e−02	2.43e+00	1.42e−01
	TSFNN-Tikhonov	1.72e+00	7.19e−02	1.71e+00	1.12e−01	5.75e+00	4.81e−01	5.62e+00	6.90e−01	2.40e+00	1.01e−01	2.37e+00	1.46e−01
	TSFNN-Dropout	2.02e+00	1.95e−01	4.04e+00	1.48e+00	1.81e+01	6.29e+02	6.50e+01	1.25e+03	2.90e+00	3.11e+00	5.36e+00	6.02e+00
Wine quality white	TSFNN-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_1$$\end{document}L1	4.69e−01	8.04e−03	5.05e−01	1.86e−01	3.63e−01	1.14e−02	7.76e+00	2.10e+02	6.02e−01	9.51e−03	8.64e−01	2.65e+00
	TSFNN-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_2$$\end{document}L2	4.91e−01	7.41e−03	6.24e−01	9.76e+00	3.96e−01	1.15e−02	3.83e+02	2.70e+04	6.29e−01	9.15e−03	9.01e−01	1.95e+01
	TSFNN-Tikhonov	4.84e−01	7.57e−03	4.87e−01	1.95e−01	3.85e−01	1.14e−02	5.91e−01	1.02e+01	6.20e−01	9.21e−03	6.29e−01	4.43e−01
	TSFNN-Dropout	5.22e−01	1.23e−01	8.80e−01	5.10e+00	1.59e+01	3.99e+02	1.23e+03	4.43e+04	1.09e+00	3.83e+00	3.16e+00	3.50e+01

Remark 5

The Tikhonov regularization and dropout regularization are all belong to regularization method. The difference lies in the selection and determination of regularization parameters which can be called the tuning parameters of regularization. For the used dropout regularization in this section, the election of key parameter p is based on the original paper of dropout regularization, that is Srivastava et al. (2014), among it, the model obtains the excellent performance when . Thus, the parameter p is set as 0.5 in this section.

Simulations for other datasets

In this section, three datasets which are Electrical Fault detection and classification datasets, Asteroid Dataset and Novel Corona Virus 2019 Dataset are used for testing model performance.

Electrical fault detection and classification dataset

Power systems consist of many complex dynamic and interactive elements that are always vulnerable to interference or electrical failures. Transmission lines are the most critical part of the power system, and the prominent role of transmission lines is to transmit electricity from the source area to the distribution destination in the network. The faults of power system transmission lines should be first correctly detected and classified, and should be eliminated in the shortest possible time. Electrical Fault detection and classification dataset contains the current and voltage of the line under different fault conditions (Jamil et al. 2015). The dataset contains the detection of power system faults and classifying fault types for the power system faults. The dataset of power system fault detection contains 12001 sampling data with six inputs (, , , , , V, c) and two outputs (0 and 1), no fault is denoted by 0, and fault is denoted by 1. The dataset of classifying fault types contains 7861 group data with six inputs (, , , , , V, c) and four outputs (G, C, B and A), they represents 4 generators of V, respectively. No fault occurs is denoted by 0, and fault occurs is denoted by 1. The combination of G, C, B and A represents various failures, and the failures faults are shown in Table 7.

Table 7

Faults represented by G, C, B and A

[G, C, B, A]	Faults
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${[}0,\ 0,\ 0,\ 0]$$\end{document}[0,0,0,0]	No fault
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${[}1,\ 0,\ 0,\ 1]$$\end{document}[1,0,0,1]	LG fault (between phase A and phase G)
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${[}0,\ 0,\ 1,\ 1]$$\end{document}[0,0,1,1]	LL fault (between phase A and phase B)
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${[}1,\ 0,\ 1,\ 1]$$\end{document}[1,0,1,1]	LLG fault (between Phases A, B and ground)
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${[}0,\ 1,\ 1,\ 1]$$\end{document}[0,1,1,1]	LLL fault (between all three phases)
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${[}1,\ 1,\ 1,\ 1]$$\end{document}[1,1,1,1]	LLLG fault (three phase symmetrical fault)

The faults of the system are judged according to the current and voltage of the power system. The dataset consists of two outputs that represents whether the system faults. Figure 6 shows that no faults in Power Electrical detection dataset. Similarly, Fig. 7 shows that faults occurs in Power Electrical detection dataset. The transverse axis delegates samples, and the longitudinal axis delegates the values of , , , , and . Compared with Figs. 6 and 7, if there is no fault in the electric power system, the values of current and voltage are generally stable, and its change trend is similar to the function, which is also consistent with the characteristics of AC in the electric power system. However, once a fault occurs in the power system, the values of current and voltage are abnormal, which represents different fault locations. This anomaly can be clearly seen from Fig. 7. The comparison result of detection of power system faults dataset that is observed to determine whether the power system is faulty is shown in Table 6.

Fig. 6

The data of no faults in Electrical detect dataset

Fig. 7

The data has faults in Electrical detect dataset

Table 6

Comparison results with TSFNN, TT2-RVFL, TT2-ELM, OP-ELM, TROP-ELM, IT2-FNN, eT2QFNN, BD-ELM, RNN-LM, RNN-BFGS and LSTM for Electrical-class, Electrical-detect. The bold parts represent the best performance of eleven algorithms on each dataset (a brief introduction is listed in Table 1)

Dataset	Method	Training (MAE)		Testing (MAE)		Training (MSE)		Testing (MSE)		Training (RMSE)		Testing (RMSE)
Dataset	Method	Mean	SD	Mean	SD	Mean	SD	Mean	SD	Mean	SD	Mean	SD
Electrical_detect	TSFNN	3.04e−04	1.45e−03	5.18e−04	1.79e−03	3.76e−06	2.09e−05	5.80e−06	3.16e−05	3.99e−04	1.90e−03	6.85e−04	2.31e−03
	TT2-RVFL	9.71e−03	3.79e−03	9.78e−03	3.82e−03	2.29e−04	2.02e−04	2.33e−04	2.06e−04	1.49e−02	5.71e−03	1.35e−02	5.19e−03
	TT2-ELM	6.57e−03	2.64e−03	6.62e−03	2.66e−03	9.70e−05	9.07e−05	9.87e−05	9.18e−05	9.17e−03	3.59e−03	9.25e−03	3.62e−03
	OP-ELM	2.22e−03	1.31e−02	2.22e−03	1.31e−02	2.97e−04	1.82e−03	2.98e−04	1.82e−03	2.91e−03	1.70e−02	2.91e−03	1.70e−02
	TROP-ELM	2.95e−03	1.48e−02	2.95e−03	1.49e−02	3.86e−04	2.03e−03	3.87e−04	2.04e−03	3.84e−03	1.93e−02	3.84e−03	1.93e−02
	IT2-FNN	2.85e−02	1.64e−02	2.85e−02	1.64e−02	1.82e−03	2.49e−03	1.82e−03	2.48e−03	3.72e−02	2.08e−02	3.72e−02	2.08e−02
	eT2QFNN	1.36e−03	6.49e−05	5.52e−04	7.43e−05	5.99e−05	1.22e−06	4.37e−07	1.31e−07	7.74e−03	7.85e−05	6.56e−04	8.10e−05
	BD-ELM	1.30e−01	8.93e−02	1.30e−01	8.93e−02	3.56e−02	3.74e−02	3.56e−02	3.74e−02	1.58e−01	1.03e−01	1.58e−01	1.03e−01
	RNN-LM	1.76e−03	5.59e−03	1.78e−03	5.71e−03	4.98e−05	6.84e−04	5.11e−05	7.01e−04	2.16e−03	6.71e−03	2.19e−03	6.80e−03
	RNN-BFGS	3.16e−02	1.46e−02	3.17e−02	1.49e−02	1.92e−03	2.38e−03	1.92e−03	2.48e−03	4.01e−02	1.76e−02	3.99e−02	1.81e−02
	LSTM	3.40e−01	2.08e−02	4.64e−01	3.14e−01	2.05e−01	3.24e−02	3.23e−01	4.07e−01	4.52e−01	2.76e−02	4.76e−01	3.10e−01

The data of no faults in Electrical detect dataset The data has faults in Electrical detect dataset Comparison results of TT2-RVFL, TT2-ELM, OP-ELM, TROP-ELM, IT2-FNN, eT2QFNN, BD-ELM, RNN-LM, RNN-BFGS, LSTM and TSFNN on the Electrical Fault detection dataset are used to test the performance of the eleven algorithms. Results of TSFNN in the Table 6 show that the generalization ability of TSFNN is better than the other four algorithms. Moreover, data indicates power system failure can be considered as no power system failure added noise by comparing with Figs. 6 and 7. Therefore, Table 6 also shows that the disturbance reject ability of TSFNN is better than the other four algorithms. In Table 6, the proposed TSFNN obtains the smallest error, followed by RNN-LM. The errors of TSFNN’s three members, TT2-RVFL, TT2ELM and TROP-ELM are larger than RNN-LM. The descending ranking of the standard deviation index is TSFNN, TT2-ELM, TT2-RVFL and RNN-LM. TT2-ELM and TT2-RVFL have type-2 fuzzy set. The type-2 fuzzy set is fused into the tensor structure. Type-2 fuzzy sets have great performance in dealing with uncertain information, which also has excellent anti-interference ability. Tensor structure has the similar characteristic as the type-2 fuzzy set. Although RNN-LM is less stable than TSFNN’s three members structure, its generalization performance is better. Therefore, the proposed tensor-based stacked neural network strategy can inherit the advantages of its members, such as stability. It also can integrate the different capabilities of its members to achieve the excellent performance of the entire model. For Electrical Fault classification case, we decompose Electrical Fault classification dataset into five parts according to the different locations of faults. It can be seen from Table 7 that the dataset provides six kinds of faults, but we do not find LL fault ([G, C, B, A] = [0, 0, 1, 1]) in the dataset, which represents the fault between Phase A and Phase B. Comparison results with TSFNN, TT2-RVFL, TT2-ELM, OP-ELM, TROP-ELM, IT2-FNN, eT2QFNN, BD-ELM, RNN-LM, RNN-BFGS and LSTM for Electrical-class, Electrical-detect. The bold parts represent the best performance of eleven algorithms on each dataset (a brief introduction is listed in Table 1) Faults represented by G, C, B and A Through the above analysis, we know that in the power system, the faults data can be regarded as the no faults data with added noise. Thus, in the five extracted datasets, the dataset of LG fault, LLG fault, LLL fault, and LLLG fault can be treated as imposing different noise on the dataset with no fault status. Moreover, the extracted dataset represents only one power system fault, so its data is more pure and has more obvious characteristics and trends. Through the above-mentioned analysis, the anti-interference performance and generalization performance of the algorithm can be further verified. Results of TSFNN in the Table 8 fully proof that the excellent disturbance rejection and generalization performance of TSFNN. The comparison results of the proposed TSFNN and ten other models are shown in Table 8. TSFNN obtains the smallest error and standard deviation on Electrical No fault dataset and Electrical LG fault dataset in training and testing part. On Electrical LLG fault dataset, RNN-LM obtains the most excellent performance. eT2QFNN is RNN-LM’s the most competitor. The eT2QFNN outperforms RNN-LM, TSFNN and TSFNN on Electrical LLL fault dataset and Electrical LLLG fault dataset.

Table 8

Dataset	Method	Training (MAE)		Testing (MAE)		Training (MSE)		Testing (MSE)		Training (RMSE)		Testing (RMSE)
Dataset	Method	Mean	SD	Mean	SD	Mean	SD	Mean	SD	Mean	SD	Mean	SD
Electrical No fault	TSFNN	1.44e−03	1.37e−03	2.07e−03	1.66e−03	7.01e−06	1.45e−05	1.39e−05	6.21e−05	1.96e−03	1.78e−03	2.91e−03	2.33e−03
	TT2-RVFL	2.62e−03	1.08e−03	2.70e−03	1.11e−03	1.82e−05	1.67e−05	2.44e−05	3.54e−05	4.16e−03	1.69e−03	4.23e−03	2.08e−03
	TT2-ELM	2.00e−03	7.57e−04	2.07e−03	7.88e−04	9.68e−06	8.05e−06	1.45e−05	2.57e−05	2.92e−03	1.08e−03	3.37e−03	1.76e−03
	OP-ELM	1.92e−02	7.73e−03	1.94e−02	7.79e−03	1.50e−03	7.24e−04	2.21e−03	4.25e−02	3.74e−02	9.69e−03	3.83e−02	2.72e−02
	TROP-ELM	1.93e−02	7.74e−03	1.94e−02	7.68e−03	1.51e−03	7.48e−04	1.60e−03	1.17e−03	3.76e−02	9.67e−03	3.81e−02	1.22e−02
	IT2-FNN	7.38e−03	3.61e−03	7.47e−03	3.65e−03	1.63e−04	1.74e−04	1.82e−04	2.19e−04	1.15e−02	5.49e−03	1.20e−02	6.15e−03
	eT2QFNN	1.51e−02	4.80e−03	4.23e−02	3.91e−02	8.26e−04	6.50e−04	5.37e−03	1.68e−02	2.75e−02	8.47e−03	5.35e−02	5.01e−02
	BD-ELM	2.16e−01	1.56e−01	2.15e−01	1.56e−01	8.97e−02	7.99e−02	8.96e−02	7.98e−02	2.48e−01	1.67e−01	2.48e−01	1.67e−01
	RNN-LM	6.30e−04	2.45e−03	6.34e−04	2.41e−03	8.73e−06	1.72e−04	8.67e−06	1.63e−04	7.50e−04	2.86e−03	7.59e−04	2.85e−03
	RNN-BFGS	1.89e−02	1.11e−02	1.90e−02	1.09e−02	1.04e−03	1.59e−03	1.07e−03	1.54e−03	2.96e−02	1.29e−02	3.00e−02	1.31e−02
	LSTM	1.61e−01	7.55e−02	9.85e−01	4.87e−01	9.77e−02	6.36e−02	1.22e+00	9.33e−01	2.92e−01	1.13e−01	9.90e−01	4.85e−01
Electrical LG fault	TSFNN	8.03e−04	8.03e−04	1.31e−03	1.01e−03	2.21e−06	4.74e−06	6.03e−06	1.26ee−05	1.08e−03	1.03e−03	2.12e−03	1.24e−03
	TT2-RVFL	1.31e−03	4.90e−04	1.51e−03	5.85e−04	6.20e−06	4.95e−06	2.91e−05	5.59e−05	2.47e−03	8.94e−04	4.08e−03	3.17e−03
	TT2-ELM	1.12e−03	3.91e−04	1.32e−03	4.95e−04	4.16e−06	3.12e−06	2.43e−05	4.94e−05	1.93e−03	6.47e−04	3.79e−03	3.15e−03
	OP-ELM	9.32e−03	5.30e−03	9.50e−03	5.36e−03	3.81e−04	2.79e−04	4.33e−04	1.20e−03	1.86e−02	6.00e−03	1.91e−02	8.22e−03
	TROP-ELM	9.37e−03	5.46e−03	9.53e−03	5.49e−03	3.83e−04	2.90e−04	4.48e−04	1.73e−03	1.85e−02	6.34e−03	1.91e−02	9.22e−03
	IT2-FNN	5.25e−03	2.27e−03	5.40e−03	2.32e−03	1.00e−04	8.23e−05	1.21e−04	1.23e−04	9.33e−03	3.63e−03	9.95e−03	4.66e−03
	eT2QFNN	4.52e−03	1.50e−03	5.17e−03	1.03e−02	6.60e−05	6.02e−05	1.98e−04	9.80e−04	7.81e−03	2.23e−03	6.28e−03	1.26e−02
	BD-ELM	2.22e−02	3.91e−02	2.24e−02	3.92e−02	2.94e−03	9.52e−03	2.97e−03	9.56e−03	3.09e−02	4.45e−02	3.14e−02	4.46e−02
	RNN-LM	3.51e−04	1.22e−03	3.62e−04	1.16e−03	2.16e−06	4.23e−05	2.04e−06	2.91e−05	4.37e−04	1.40e−03	4.93e−04	1.34e−03
	RNN-BFGS	1.19e−02	4.43e−03	1.22e−02	4.61e−03	4.17e−04	2.67e−04	4.55e−04	3.13e−04	1.98e−02	5.16e−03	2.04e−02	6.29e−03
	LSTM	1.98e−02	1.55e−02	4.72e−02	6.36e−02	2.15e−03	5.93e−03	8.75e−03	2.63e−02	3.22e−02	3.46e−02	5.63e−02	7.73e−02
Electrical LLG fault	TSFNN	1.30e−03	1.07e−03	1.92e−03	1.31e−03	4.91e−06	7.29e−06	1.03e−05	1.67e−05	1.74e−03	1.37e−03	2.78e−03	1.61e−03
	TT2-RVFL	1.73e−03	6.12e−04	2.01e−03	7.48e−04	1.04e−05	7.81e−06	5.58e−05	1.04e−04	3.21e−03	1.12e−03	5.60e−03	4.42e−03
	TT2-ELM	1.38e−03	4.54e−04	1.67e−03	6.17e−04	6.06e−06	4.13e−06	4.82e−05	9.29e−05	2.35e−03	7.45e−04	5.26e−03	4.52e−03
	OP-ELM	1.49e−02	6.99e−03	1.51e−02	7.07e−03	7.32e−04	4.27e−04	8.07e−04	2.15e−03	2.60e−02	7.28e−03	2.65e−02	1.02e−02
	TROP-ELM	1.50e−02	7.04e−03	1.52e−02	7.12e−03	7.38e−04	4.38e−04	7.82e−04	5.33e−04	2.61e−02	7.43e−03	2.65e−02	8.84e−03
	IT2-FNN	6.10e−03	2.81e−03	6.22e−03	2.84e−03	1.11e−04	9.91e−05	1.31e−04	1.40e−04	9.72e−03	4.09e−03	1.03e−02	5.01e−03
	eT2QFNN	1.80e−03	6.86e−05	2.87e−02	4.57e−04	3.08e−05	7.37e−07	6.19e−03	2.21e−04	5.55e−03	5.68e−05	7.87e−02	1.08e−03
	BD-ELM	2.67e−02	4.09e−02	2.68e−02	4.09e−02	3.48e−03	1.02e−02	3.51e−03	1.02e−02	3.63e−02	4.66e−02	3.66e−02	4.67e−02
	RNN-LM	3.62e−04	8.92e−04	3.84e−04	8.74e−04	1.32e−06	1.72e−05	1.54e−06	1.41e−05	4.73e−04	1.05e−03	6.10e−04	1.08e−03
	RNN-BFGS	1.15e−02	4.49e−03	1.20e−02	4.86e−03	3.40e−04	2.68e−04	3.83e−04	3.10e−04	1.76e−02	5.58e−03	1.84e−02	6.58e−03
	LSTM	1.85e−02	1.52e−02	3.68e−02	6.19e−02	3.14e−03	3.52e−02	5.87e−03	4.60e−02	2.92e−02	4.79e−02	4.22e−02	6.40e−02
Electrical LLL fault	TSFNN	8.87e−04	4.25e−04	1.38e−03	7.85e−04	2.32e−06	2.30e−06	8.91e−06	1.39e−04	1.36e−03	6.89e−04	2.20e−03	2.02e−03
	TT2-RVFL	1.08e−03	3.59e−04	1.22e−03	4.13e−04	3.80e−06	2.56e−06	1.76e−05	3.58e−05	1.96e−03	6.03e−04	3.15e−03	2.47e−03
	TT2-ELM	8.92e−04	2.65e−04	1.03e−03	3.38e−04	2.39e−06	1.41e−06	1.45e−05	3.24e−05	1.49e−03	4.10e−04	2.89e−03	2.47e−03
	OP-ELM	1.22e−02	3.34e−03	1.25e−02	3.50e−03	6.07e−04	2.27e−04	1.01e−03	7.52e−03	2.41e−02	5.29e−03	2.61e−02	1.82e−02
	TROP-ELM	1.20e−02	3.41e−03	1.24e−02	3.84e−03	6.04e−04	2.29e−04	1.58e−03	2.38e−02	2.39e−02	5.53e−03	2.67e−02	2.94e−02
	IT2-FNN	4.11e−03	2.36e−03	4.18e−03	2.40e−03	4.41e−05	6.04e−05	4.84e−05	7.52e−05	5.78e−03	3.26e−03	5.98e−03	3.56e−03
	eT2QFNN	5.65e−04	1.15e−06	1.09e−01	2.61e−05	1.99e−05	2.37e−07	5.18e−02	2.46e−05	4.46e−03	2.67e−05	2.28e−01	5.41e−05
	BD-ELM	1.26e−02	1.58e−02	1.28e−02	1.60e−02	9.44e−04	2.40e−03	9.66e−04	2.46e−03	1.85e−02	2.46e−02	1.87e−02	2.48e−02
	RNN-LM	2.41e−04	3.29e−04	3.15e−04	3.61e−04	3.24e−07	2.34e−06	1.18e−06	4.42e−06	3.89e−04	4.16e−04	7.65e−04	7.69e−04
	RNN-BFGS	8.47e−03	3.89e−03	8.68e−03	3.81e−03	2.25e−04	2.32e−04	2.71e−04	2.81e−04	1.37e−02	6.07e−03	1.49e−02	6.92e−03
	LSTM	6.82e−02	4.56e−02	3.66e−01	3.49e−01	2.18e−02	2.55e−02	2.63e−01	5.41e−01	1.26e−01	7.69e−02	3.75e−01	3.50e−01
Electrical LLLG fault	TSFNN	1.34e−03	7.57e−04	2.02e−03	1.02e−03	4.57e−06	5.23e−06	1.06e−05	1.66e−05	1.90e−03	9.81e−04	2.92e−03	1.44e−03
	TT2-RVFL	1.13e−03	3.89e−04	1.26e−03	4.35e−04	4.31e−06	3.10e−06	1.64e−05	2.88e−05	2.08e−03	6.77e−04	3.12e−03	2.28e−03
	TT2-ELM	9.19e−04	2.87e−04	1.05e−03	3.50e−04	2.68e−06	1.67e−06	1.34e−05	2.56e−05	1.57e−03	4.51e−04	2.85e−03	2.29e−03
	OP-ELM	1.90e−02	6.36e−03	1.92e−02	6.46e−03	1.04e−03	5.81e−04	1.10e−03	8.30e−04	3.11e−02	8.24e−03	3.17e−02	9.55e−03
	TROP-ELM	1.89e−02	6.47e−03	1.92e−02	6.55e−03	1.03e−03	5.89e−04	1.13e−03	1.38e−03	3.11e−02	8.24e−03	3.19e−02	1.07e−02
	IT2-FNN	4.66e−03	2.56e−03	4.74e−03	2.61e−03	6.19e−05	7.90e−05	6.70e−05	9.07e−05	6.99e−03	3.61e−03	7.21e−03	3.87e−03
	eT2QFNN	4.51e−04	2.10e−06	1.06e−01	6.45e−05	1.56e−05	1.23e−07	3.88e−02	4.32e−05	3.94e−03	1.56e−05	1.97e−01	1.10e−04
	BD-ELM	1.49e−02	1.39e−02	1.51e−02	1.40e−02	9.33e−04	1.67e−03	9.62e−04	1.71e−03	2.20e−02	2.12e−02	2.23e−02	2.15e−02
	RNN-LM	2.06e−04	2.67e−04	2.51e−04	2.85e−04	2.51e−07	1.17e−06	7.57e−07	2.39e−06	3.36e−04	3.72e−04	6.13e−04	6.18e−04
	RNN-BFGS	8.65e−03	3.04e−03	8.66e−03	3.27e−03	2.07e−04	1.49e−04	2.32e−04	1.94e−04	1.37e−02	4.45e−03	1.43e−02	5.39e−03
	LSTM	2.04e−02	1.04e−02	1.26e−01	2.05e−01	1.94e−03	2.43e−03	6.02e−02	1.52e−01	3.85e−02	2.19e−02	1.37e−01	2.08e−01

Comparison results with TSFNN, TT2-RVFL, TT2-ELM, OP-ELM, TROP-ELM, IT2-FNN, eT2QFNN, BD-ELM, RNN-LM, RNN-BFGS and LSTM for various faults in Electrical dataset (a brief introduction is listed in Table 1) It can be seen from the results that the disturbance rejects ability of the proposed TSFNN is worse than eT2QFNN and RNN-LM. However, TSFNN outperforms the OP-ELM, IT2-FNN, BD-ELM and its three constituent members. RNN-LM is a deep network. eT2QFNN has an interval type-2 quantum fuzzy set with uncertain jump positions. The quantum fuzzy set possesses a graded membership degree. The process of generating the graded membership degree make eT2QFNN has a deep structure. Therefore, eT2QFNN can be considered as a deep network that is different from RNN-LM. IT2 fuzzy set is one of the reasons why eT2QFNN performs better than RNN-LM. The deep structure also is the reason why eT2QFNN and RNN-LM outperform the rest of the comparison algorithm. Although the performance of the deep network is better than TSFNN, the proposed TSFNN achieves comparable performance to deep networks. The one of reason is the tensor-based stacked neural network structure. According to the above results, the effectiveness of the tensor-based network stacked strategy can be proved powerfully.

Asteroid dataset

The Asteroid Dataset is officially maintained by Jet Propulsion Laboratory of California Institute of Technology, which is an organization under NASA. The data set is publicly available in JPL Small-Body Database Search Engine. This dataset could also be obtained from kaggle1. Table 9 shows basic column definition for Asteroid dataset.

Table 9

Basic column definition for Asteroid dataset

Attributes	Description
SPK-ID	Object primary SPK-ID
Object ID	Object internal database ID
Object fullname	Object full name/designation
Pdes	Object primary designation
Name	Object IAU name
NEO	Near-earth object (NEO) flag
PHA	Potentially Hazardous Asteroid (PHA) flag
H	Absolute magnitude parameter
Diameter	Object diameter (from equivalent sphere) km unit
Albedo	Geometric albedo
Diameter_sigma	1-sigma uncertainty in object diameter km unit
Orbit_id	Orbit solution ID
Epoch	Epoch of osculation in modified Julian day form
Equinox	Equinox of reference frame
e	Eccentricity
a	Semi-major axis au unit
q	Perihelion distance au unit
i	Inclination; angle with respect to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x-y$$\end{document}x-y ecliptic plane
tp	Time of perihelion passage TDB Unit
moid_ld	Earth minimum orbit intersection distance au unit

A portion of the data is extract by us as a comparison test dataset when using Asteroid dataset. The 2500 samples in the dataset presents 7 attributes, these samples are applied to validate the proposed algorithm. These properties are Geometric albedo, Eccentricity, Semi-major axis, inclination angle about the x-y elliptic plane, Earth Minimum Orbit Intersection Distance and RMS for the Asteroid, respectively. The comparison results with eleven methods are demonstrated in Table 9. The results demonstrate that the TSFNN performs best with respect to training error, while TT2-RVFL and TT2-ELM performs best in testing error. Meanwhile, the performance of OP-ELM and TROP-ELM is bad. Because the approach of proposed in this paper is stacked of TT2-RVFL, TT2-ELM and TROP-ELM. The main reason why TSFNN, OP-ELM and TROP-ELM perform well in training and poor in testing is that these three methods all use multi-response sparse regression (MRSR), it is a variable sorting technology that is extended from the least angle regression algorithm (Similä and Tikka 2005; Efron et al. 2004). According to the usefulness of neurons, the MRSR algorithm can obtain a ranking of the neurons in OP-ELM (Miche et al. 2010). TROP-ELM is an improvement of OP-ELM, the MRSR method is also used for the input data of TROP-ELM. Due to the proposed TSFNN includes TROP-ELM, thus the TSFNN is affected by MRSR method. To the MRSR, an important feature is that the obtained ordering is exact in the case of linear problems. The Asteroid dataset collects the attributes of asteroid. A part of its data is used, it is also nonlinear. And the OP-ELM and the TROP-ELM have one detail that the neural networks they constructed are linear between the hidden layer and the output layer, the role of MRSR algorithm is that will get an exact ranking of the neurons. The sequence obtained by sorting can be used to sort the kernels of model. When the whole dataset is nonliner, so the exact ranking of neurons cannot be obtained by OP-ELM. Similarly, TROP-ELM and TSFNN are also affected by this flaw. Therefore, TSFNN performs well in the training part, and in the test part, due to MRSR method, the extracted data set features cannot be well applied to the testing set, resulting in poor performance of TSFNN in the testing phase. Basic column definition for Asteroid dataset Comparison results with TSFNN, TT2-RVFL, TT2-ELM, OP-ELM, TROP-ELM, IT2-FNN, eT2QFNN, BD-ELM, RNN-LM, RNN-BFGS and LSTM for Asteroid. The bold parts represent the best performance of eleven algorithms on each dataset (a brief introduction is listed in Table 1) According to the data in Table 10, the performance of TT2-RVFL and TT2-ELM is the best. The TT2-RVFL and the TT2-ELM are constructed by tensor structure and interval type-2 fuzzy sets. The membership degree of type-2 fuzzy set is characterized by type-1 fuzzy set. Since the type-1 fuzzy set has a strong ability to deal with uncertainty in the system, so type-2 fuzzy set greatly strengthens the processing ability of fuzzy system for uncertainty and nonlinearity, and it has good performance in nonlinear systems with high uncertainty. Therefore, type-2 fuzzy systems have strong generalization ability. And the tensor structure is also good at dealing with uncertain systems, which can improve the generalization performance of the system.

Table 10

Comparison results with TSFNN, TT2-RVFL, TT2-ELM, OP-ELM, TROP-ELM, IT2-FNN, eT2QFNN, BD-ELM, RNN-LM, RNN-BFGS and LSTM for Asteroid. The bold parts represent the best performance of eleven algorithms on each dataset (a brief introduction is listed in Table 1)

Dataset	Method	Training (MAE)		Testing (MAE)		Training (MSE)		Testing (MSE)		Training (RMSE)		Testing (RMSE)
Dataset	Method	Mean	Std	Mean	Std	Mean	Std	Mean	Std	Mean	Std	Mean	Std
Asteroid	TSFNN	2.30e−02	3.02e−04	6.06e+01	2.37e+03	1.03e−03	3.30e−05	1.92e+07	9.09e+08	3.22e−02	5.13e−04	1.14e+02	4.38e+03
	TT2-RVFL	2.31e−02	3.03e−04	2.36e−02	7.23e−04	1.04e−03	3.36e−05	1.12e−03	9.40e−05	3.22e−02	5.23e−04	3.34e−02	1.39e−03
	TT2-ELM	2.30e−02	3.03e−04	2.36e−02	7.28e−04	1.03e−03	3.33e−05	1.12e−03	9.58e−05	3.22e−02	5.19e−04	3.35e−02	1.42e−03
	OP-ELM	2.38e−02	6.04e−04	2.77e+02	1.33e+04	1.11e−03	5.60e−05	1.32e+11	7.64e+12	3.33e−02	8.33e−04	7.58e+03	3.63e+05
	TROP-ELM	2.38e−02	6.10e−04	2.36e+02	1.45e+04	1.11e−03	5.67e−05	1.58e+11	1.12e+13	3.33e−02	8.42e−04	6.47e+03	3.98e+05
	IT2-FNN	2.34e−02	3.08e−04	2.35e−02	7.00e−04	1.08e−03	3.63e−05	1.10e−03	8.64e−05	3.29e−02	5.53e−04	3.32e−02	1.30e−03
	eT2QFNN	2.35e−02	5.84e−04	2.44e−02	1.31e−03	1.15e−03	5.47e−05	1.16e−03	1.17e−04	3.39e−02	7.87e−04	3.40e−02	1.56e−03
	BD-ELM	2.35e−02	4.82e−04	2.36e−02	7.94e−04	1.08e−03	4.73e−05	1.11e−03	9.42e−05	3.29e−02	7.09e−04	3.33e−02	1.40e−03
	RNN-LM	2.32e−02	3.17e−04	2.36e−02	7.30e−04	1.05e−03	3.66e−05	1.10e−03	9.21e−05	3.24e−02	5.65e−04	3.32e−02	1.37e−03
	RNN-BFGS	2.36e−02	6.89e−04	2.37e−02	9.24e−04	1.10e−03	6.19e−05	1.11e−03	9.83e−05	3.31e−02	8.65e−04	3.33e−02	1.44e−03
	LSTM	2.87e−02	1.12e−02	4.09e−02	4.01e−02	2.14e−03	2.36e−03	4.01e−03	8.51e−03	4.24e−02	1.84e−02	5.01e−02	3.87e−02

The merits of type-2 fuzzy sets, and the tensor structure are inherited by the tensor-based type-2 fuzzy system. On the basis of the above analysis, TT2-RVFL and TT2-ELM have good performance on Asteroid dataset, their training error and testing error are minimal in Table 9. Comparison results with TSFNN, TT2-RVFL, TT2-ELM, OP-ELM, TROP-ELM, IT2-FNN, eT2QFNN, BD-ELM, RNN-LM, RNN-BFGS and LSTM for Covid19_Beijing, Covid19_Shanghai, Covid19_Tianjin, Covid19_Chongqing. The bold parts represent the best performance of eleven algorithms on each dataset (a brief introduction is listed in Table 1) Comparison results with TSFNN, TT2-RVFL, TT2-ELM, OP-ELM, TROP-ELM, IT2-FNN, eT2QFNN, BD-ELM, RNN-LM, RNN-BFGS and LSTM for Covid19_Arizona, Covid19_Washington, Covid19_California, Covid19_Illinois. The bold parts represent the best performance of eleven algorithms on each dataset (a brief introduction is listed in Table 1) From the test performance of TSFNN, since TSFNN contains TT2-RVFL and TT2-ELM, it makes up for the defect of insufficient generalization ability of TROP-ELM in nonlinear system. This also proves the excellent generalization ability of type-2 fuzzy systems, the advantages of the stacked tensor-based hybrid single fuzzy neural networks indicate that the stacked way of networks designing can inherit the merits of the used algorithms, and the stacked structure of the three algorithms are complementary with each other.

Novel corona virus 2019 dataset

COVID-19 affected cases in the Corona Virus 2019 dataset contains date information label. This dataset contains daily level information about the number of affected cases, the number of deaths and the rehabilitation of the new coronavirus in 2019. It is worth noting that this is a time series data, so any number of cases given a fixed date is cumulative. the data from national centers for disease control and prevention are collected in github. The data is updated daily. Eight cities in Beijing, Shanghai, Chongqing Tianjin, Arizona, Washington, California and Illinois are used for testing, and the time stamp ranges from 22 Jan, 2020 to 29 May, 2021. The data for Novel Corona Virus 2019 Dataset in Beijing, Shanghai, Tianjin and Chongqing The data for Novel Corona Virus 2019 Dataset in Arizona, Washington, California and Illinois There is no doubt that the extracted data of eight cities is composed of time series dataset, and is a small-scale dataset with three attributes. Four of the eight cities that are selected were from China, four were from the United States, the outbreaks in both regions were predicted. Although the data set itself is a small-scale one, the attribute of the data set is only three. Obviously, attribute is not sufficient. The performance of the proposed network is tested in the case that the feature attributes of the dataset are insufficient. The comparison results are shown in Tables 11 and 12. Figure 8 is the results that is carried out based on samples and attributes from four datasets, Beijing, Shanghai, Tianjin and Chongqing. From the results, TSFNN has the best performance compared with the other four algorithms. By comparing the data results in Tables 11 and 12, on the whole, the data results of the five comparison methods are similar. Moreover, from the data results, the results of Beijing, Shanghai, Tianjin and Chongqing are significantly worse than those of Arizona, Washington, California and Illinois.

Table 11

Comparison results with TSFNN, TT2-RVFL, TT2-ELM, OP-ELM, TROP-ELM, IT2-FNN, eT2QFNN, BD-ELM, RNN-LM, RNN-BFGS and LSTM for Covid19_Beijing, Covid19_Shanghai, Covid19_Tianjin, Covid19_Chongqing. The bold parts represent the best performance of eleven algorithms on each dataset (a brief introduction is listed in Table 1)

Dataset	Method	Training (MAE)		Testing (MAE)		Training (MSE)		Testing (MSE)		Training (RMSE)		Testing (RMSE)
Dataset	Method	Mean	SD	Mean	SD	Mean	SD	Mean	SD	Mean	SD	Mean	SD
Covid19_Beijing	TSFNN	3.56e+01	2.61e+00	3.65e+01	4.26e+00	3.20e+03	3.86e+02	3.41e+03	7.93e+02	5.64e+01	3.47e+00	5.80e+01	6.91e+00
	TT2-RVFL	3.88e+01	2.31e+00	3.95e+01	2.69e+00	3.91e+03	3.54e+02	4.01e+03	8.51e+02	6.23e+01	2.84e+00	6.30e+01	6.70e+00
	TT2-ELM	3.85e+01	2.29e+00	3.92e+01	2.70e+00	3.87e+03	3.48e+02	3.97e+03	8.36e+02	6.21e+01	2.83e+00	6.26e+01	6.61e+00
	OP-ELM	4.80e+01	7.15e+00	4.88e+01	7.88e+00	5.85e+03	1.88e+03	6.01e+03	2.25e+03	7.57e+01	1.13e+01	7.63e+01	1.33e+01
	TROP-ELM	4.79e+01	6.84e+00	4.85e+01	7.37e+00	5.80e+03	1.82e+03	5.93e+03	2.11e+03	7.53e+01	1.10e+01	7.59e+01	1.28e+01
	IT2-FNN	4.20e+01	2.71e+00	4.25e+01	3.17e+00	4.41e+03	4.40e+02	4.49e+03	9.78e+02	6.63e+01	3.34e+00	6.66e+01	7.33e+00
	eT2QFNN	2.58e+01	5.76e+00	3.20e+01	2.33e+01	2.02e+03	9.59e+02	1.85e+03	8.03e+03	4.39e+01	9.70e+00	3.58e+01	2.39e+01
	BD-ELM	3.91e+01	6.39e+00	4.94e+01	2.12e+02	3.90e+03	1.87e+03	5.00e+04	2.01e+06	6.20e+01	7.06e+00	7.23e+01	2.12e+02
	RNN-LM	3.98e+01	2.91e+00	4.12e+01	3.37e+00	4.00e+03	4.81e+02	4.15e+03	9.26e+02	6.32e+01	3.24e+00	6.40e+01	6.58e+00
	RNN-BFGS	4.33e+01	5.89e+00	4.45e+01	6.73e+00	4.78e+03	1.03e+03	4.92e+03	1.51e+03	6.89e+01	5.78e+00	6.96e+01	8.95e+00
	LSTM	–	–	–	–	–	–	–	–	–	–	–	–
Covid19_Shanghai	TSFNN	1.83e+01	1.51e+00	2.04e+01	2.34e+00	5.79e+02	8.75e+01	7.16e+02	1.62e+02	2.40e+01	1.76e+00	2.66e+01	2.93e+00
	TT2-RVFL	2.31e+01	1.26e+00	2.38e+01	1.67e+00	9.00e+02	8.93e+01	9.71e+02	1.72e+02	2.90e+01	1.61e+00	3.09e+01	2.72e+00
	TT2-ELM	2.22e+01	1.16e+00	2.29e+01	1.57e+00	8.24e+02	7.93e+01	8.93e+02	1.63e+02	2.87e+01	1.37e+00	2.98e+01	2.66e+00
	OP-ELM	4.62e+01	2.43e+01	4.68e+01	2.47e+01	4.44e+03	4.67e+03	4.57e+03	4.84e+03	6.00e+01	2.90e+01	6.09e+01	2.95e+01
	TROP-ELM	4.71e+01	2.52e+01	4.76e+01	2.55e+01	4.63e+03	4.94e+03	4.74e+03	5.05e+03	6.10e+01	3.00e+01	6.18e+01	3.04e+01
	IT2-FNN	2.91e+01	1.67e+00	2.96e+01	2.27e+00	1.54e+03	1.25e+02	1.59e+03	2.32e+02	3.92e+01	1.59e+00	3.98e+01	2.90e+00
	eT2QFNN	2.66e+01	3.04e+00	4.87e+01	1.91e+01	1.69e+03	4.57e+02	3.57e+03	3.77e+03	4.09e+01	4.84e+00	5.59e+01	2.10e+01
	BD-ELM	2.46e+01	1.12e+01	2.71e+01	4.49e+01	1.21e+03	5.95e+03	3.33e+03	8.52e+04	3.18e+01	1.39e+01	3.50e+01	4.59e+01
	RNN-LM	2.59e+01	2.36e+00	2.76e+01	2.83e+00	1.20e+03	2.30e+02	1.37e+03	3.16e+02	3.44e+01	3.40e+00	3.68e+01	4.28e+00
	RNN-BFGS	3.79e+01	1.05e+01	4.00e+01	1.10e+01	2.52e+03	1.98e+03	2.78e+03	2.20e+03	4.86e+01	1.25e+01	5.10e+01	1.33e+01
	LSTM	4.65e+01	9.63e+00	5.02e+01	1.09e+01	3.73e+03	1.44e+03	3.70e+03	1.50e+03	6.01e+01	1.17e+01	5.98e+01	1.19e+01
Covid19_Tianjin	TSFNN	5.88e+00	3.26e−01	6.28e+00	6.66e−01	9.29e+01	1.11e+01	1.00e+02	2.32e+01	9.62e+00	5.92e−01	9.95e+00	1.17e+00
	TT2-RVFL	6.83e+00	3.61e−01	7.02e+00	6.03e−01	1.10e+02	1.01e+01	1.19e+02	2.79e+01	1.03e+01	4.81e−01	1.08e+01	1.26e+00
	TT2-ELM	6.54e+00	3.34e−01	6.73e+00	6.00e−01	1.05e+02	9.45e+00	1.13e+02	2.66e+01	1.02e+01	4.64e−01	1.05e+01	1.23e+00
	OP-ELM	1.05e+01	4.01e+00	1.07e+01	4.07e+00	2.45e+02	1.60e+02	2.65e+02	6.54e+02	1.51e+01	4.15e+00	1.54e+01	5.21e+00
	TROP-ELM	1.07e+01	4.08e+00	1.09e+01	4.20e+00	2.49e+02	1.63e+02	2.66e+02	2.64e+02	1.52e+01	4.22e+00	1.56e+01	4.90e+00
	IT2-FNN	8.84e+00	5.11e−01	8.97e+00	6.34e−01	1.88e+02	2.13e+01	1.96e+02	5.03e+01	1.37e+01	7.83e−01	1.39e+01	1.78e+00
	eT2QFNN	6.63e+00	8.16e−01	1.51e+01	1.31e+01	1.28e+02	5.28e+01	4.57e+02	1.56e+03	1.12e+01	1.65e+00	1.63e+01	1.38e+01
	BD-ELM	6.83e+00	2.92e+00	1.14e+01	5.02e+01	1.27e+02	3.20e+02	1.53e+04	3.00e+05	1.07e+01	3.58e+00	2.29e+01	1.22e+02
	RNN-LM	7.65e+00	9.43e−01	8.19e+00	1.10e+00	1.22e+02	2.01e+01	1.46e+02	5.65e+01	1.10e+01	9.16e−01	1.20e+01	1.71e+00
	RNN-BFGS	1.01e+01	3.15e+00	1.04e+01	3.71e+00	2.10e+02	9.38e+02	2.36e+02	1.19e+03	1.40e+01	3.95e+00	1.47e+01	4.60e+00
	LSTM	9.20e+00	1.30e+00	9.52e+00	6.94e−01	1.72e+02	4.70e+01	2.13e+02	5.73e+01	1.30e+01	1.87e+00	1.45e+01	1.96e+00
Covid19_Chongqing	TSFNN	1.29e+01	1.93e+00	1.28e+01	2.82e+00	7.86e+02	1.44e+02	8.28e+02	3.74e+02	2.79e+01	2.65e+00	2.80e+01	6.50e+00
	TT2-RVFL	1.34e+01	1.77e+00	1.39e+01	1.25e+00	9.13e+02	1.52e+02	9.83e+02	3.77e+02	2.96e+01	2.60e+00	3.07e+01	6.06e+00
	TT2-ELM	1.33e+01	1.79e+00	1.38e+01	1.18e+00	8.68e+02	1.49e+02	9.30e+02	3.62e+02	2.94e+01	2.61e+00	2.99e+01	5.97e+00
	OP-ELM	1.91e+01	4.56e+00	2.01e+01	4.89e+00	1.51e+03	4.10e+02	2.08e+03	7.20e+03	3.84e+01	5.45e+00	4.15e+01	1.88e+01
	TROP-ELM	1.92e+01	4.55e+00	2.02e+01	7.58e+00	1.51e+03	4.17e+02	4.31e+03	1.50e+05	3.85e+01	5.54e+00	4.23e+01	5.02e+01
	IT2-FNN	1.83e+01	2.09e+00	1.86e+01	2.30e+00	1.78e+03	2.73e+02	1.83e+03	6.25e+02	4.20e+01	3.28e+00	4.21e+01	7.45e+00
	eT2QFNN	1.18e+01	4.80e+00	3.67e+00	3.93e+00	8.91e+02	5.57e+02	2.96e+01	3.99e+02	2.92e+01	6.06e+00	3.76e+00	3.93e+00
	BD-ELM	1.37e+01	2.26e+00	2.64e+01	2.09e+02	8.62e+02	3.15e+02	4.63e+04	1.56e+06	2.91e+01	4.01e+00	4.74e+01	2.10e+02
	RNN-LM	1.52e+01	3.86e+00	1.94e+01	6.74e+00	1.06e+03	3.35e+02	2.38e+03	2.84e+03	3.22e+01	4.94e+00	4.44e+01	2.02e+01
	RNN-BFGS	1.82e+01	5.50e+00	1.99e+01	5.72e+00	1.55e+03	9.06e+02	2.12e+03	1.03e+03	3.90e+01	5.26e+00	4.51e+01	9.41e+00
	LSTM	–	–	–	–	–	–	–	–	–	–	–	–

Table 12

Comparison results with TSFNN, TT2-RVFL, TT2-ELM, OP-ELM, TROP-ELM, IT2-FNN, eT2QFNN, BD-ELM, RNN-LM, RNN-BFGS and LSTM for Covid19_Arizona, Covid19_Washington, Covid19_California, Covid19_Illinois. The bold parts represent the best performance of eleven algorithms on each dataset (a brief introduction is listed in Table 1)

Dataset	Method	Training (MAE)		Testing (MAE)		Training (MSE)		Testing (MSE)		Training (RMSE)		Testing (RMSE)
Dataset	Method	Mean	SD	Mean	SD	Mean	SD	Mean	SD	Mean	SD	Mean	SD
Covid19_Arizona	TSFNN	3.68e−02	7.53e−03	3.65e−02	1.28e−02	1.49e−02	3.00e−03	1.50e−02	7.36e−03	1.22e−01	1.28e−02	1.17e−01	3.52e−02
	TT2-RVFL	3.88e−02	8.15e−03	3.95e−02	5.16e−03	1.56e−02	3.22e−03	1.61e−02	7.83e−03	1.24e−01	1.34e−02	1.23e−01	3.20e−02
	TT2-ELM	3.86e−02	8.12e−03	3.93e−02	5.19e−03	1.55e−02	3.21e−03	1.61e−02	7.83e−03	1.24e−01	1.35e−02	1.23e−01	3.20e−02
	OP-ELM	4.06e−02	8.78e−03	4.11e−02	5.34e−03	1.59e−02	3.37e−03	1.64e−02	7.88e−03	1.25e−01	1.40e−02	1.24e−01	3.19e−02
	TROP-ELM	4.06e−02	8.78e−03	4.12e−02	5.34e−03	1.59e−02	3.37e−03	1.64e−02	7.87e−03	1.25e−01	1.40e−02	1.24e−01	3.18e−02
	IT2-FNN	3.73e−02	8.05e−03	3.77e−02	5.07e−03	1.63e−02	3.49e−03	1.67e−02	8.34e−03	1.27e−01	1.44e−02	1.25e−01	3.41e−02
	eT2QFNN	2.97e−02	9.11e−03	1.10e−01	1.49e−01	2.26e−02	4.95e−02	3.88e−02	1.76e−01	1.43e−01	4.64e−02	1.17e−01	1.59e−01
	BD-ELM	4.12e−02	8.65e−03	5.65e−02	2.01e−01	1.51e−02	3.08e−03	5.98e−02	1.01e+00	1.22e−01	1.32e−02	1.36e−01	2.03e−01
	RNN-LM	4.00e−02	8.62e−03	7.95e−02	1.47e−01	1.55e−02	3.34e−03	6.26e−02	2.19e−01	1.24e−01	1.39e−02	1.75e−01	1.79e−01
	RNN-BFGS	4.11e−02	9.49e−03	4.43e−02	3.31e−02	1.63e−02	3.45e−03	1.86e−02	4.00e−02	1.27e−01	1.41e−02	1.28e−01	4.67e−02
	LSTM	4.88e−02	1.49e−02	4.88e−02	1.49e−02	1.72e−02	7.37e−03	1.72e−02	7.37e−03	1.30e−01	2.02e−02	1.30e−01	2.02e−02
Covid19_Washington	TSFNN	3.26e−02	6.40e−03	3.27e−02	1.16e−02	1.38e−02	2.65e−03	1.35e−02	6.29e−03	1.17e−01	1.17e−02	1.12e−01	3.21e−02
	TT2-RVFL	3.47e−02	6.81e−03	3.56e−02	6.17e−03	1.41e−02	2.74e−03	1.48e−02	6.65e−03	1.18e−01	1.20e−02	1.19e−01	2.76e−02
	TT2-ELM	3.39e−02	6.65e−03	3.48e−02	6.30e−03	1.40e−02	2.72e−03	1.48e−02	6.62e−03	1.18e−01	1.20e−02	1.19e−01	2.75e−02
	OP-ELM	3.76e−02	8.53e−03	3.85e−02	7.14e−03	1.43e−02	2.87e−03	1.51e−02	6.75e−03	1.19e−01	1.24e−02	1.20e−01	2.79e−02
	TROP-ELM	3.76e−02	8.63e−03	3.85e−02	7.17e−03	1.43e−02	2.88e−03	1.50e−02	6.71e−03	1.19e−01	1.25e−02	1.19e−01	2.77e−02
	IT2-FNN	4.25e−02	8.77e−03	4.34e−02	4.69e−03	1.47e−02	3.00e−03	1.54e−02	7.29e−03	1.21e−01	1.30e−02	1.21e−01	2.97e−02
	eT2QFNN	3.05e−02	5.08e−03	1.66e−01	1.73e−01	2.40e−02	8.71e−03	7.27e−02	2.03e−01	1.53e−01	2.50e−02	1.88e−01	1.93e−01
	BD-ELM	3.49e−02	7.49e−03	4.70e−02	1.65e−01	1.38e−02	2.70e−03	4.38e−02	8.11e−01	1.17e−01	1.21e−02	1.29e−01	1.65e−01
	RNN-LM	3.86e−02	8.18e−03	9.24e−02	2.03e−01	1.38e−02	2.80e−03	9.49e−02	3.70e−01	1.17e−01	1.24e−02	1.96e−01	2.38e−01
	RNN-BFGS	4.41e−02	9.54e−03	4.68e−02	2.82e−02	1.46e−02	3.00e−03	1.74e−02	3.57e−02	1.20e−01	1.28e−02	1.25e−01	4.20e−02
	LSTM	4.93e−02	1.49e−02	5.02e−02	2.46e−02	1.56e−02	7.28e−03	1.71e−02	1.16e−02	1.23e−01	1.91e−02	1.26e−01	3.48e−02
Covid19_California	TSFNN	1.84e−01	3.99e−02	1.84e−01	6.83e−02	4.17e−01	9.64e−02	4.20e−01	2.32e−01	6.41e−01	8.00e−02	6.13e−01	2.10e−01
	TT2-RVFL	1.94e−01	4.33e−02	1.98e−01	2.79e−02	4.31e−01	1.02e−01	4.52e−01	2.49e−01	6.49e−01	8.25e−02	6.44e−01	1.92e−01
	TT2-ELM	1.93e−01	4.31e−02	1.97e−01	2.83e−02	4.29e−01	1.01e−01	4.50e−01	2.48e−01	6.50e−01	8.27e−02	6.43e−01	1.92e−01
	OP-ELM	2.11e−01	4.88e−02	2.15e−01	2.69e−02	4.44e−01	1.08e−01	4.63e−01	2.53e−01	6.60e−01	8.64e−02	6.52e−01	1.94e−01
	TROP-ELM	2.11e−01	4.85e−02	2.15e−01	2.68e−02	4.43e−01	1.08e−01	4.62e−01	2.52e−01	6.60e−01	8.67e−02	6.52e−01	1.93e−01
	IT2-FNN	1.91e−01	4.42e−02	1.93e−01	2.60e−02	4.58e−01	1.11e−01	4.67e−01	2.68e−01	6.71e−01	8.85e−02	6.50e−01	2.10e−01
	eT2QFNN	2.54e−01	8.36e−02	1.42e+00	1.81e+00	1.39e+00	9.94e−01	6.15e+00	2.21e+01	1.13e+00	3.41e−01	1.53e+00	1.95e+00
	BD-ELM	2.03e−01	4.51e−02	3.89e−01	2.95e+00	4.20e−01	9.75e−02	9.41e+00	3.00e+02	6.43e−01	8.10e−02	8.15e−01	2.96e+00
	RNN-LM	2.12e−01	4.79e−02	4.77e−01	1.06e+00	4.35e−01	1.05e−01	2.46e+00	9.98e+00	6.55e−01	8.26e−02	9.85e−01	1.22e+00
	RNN-BFGS	2.17e−01	5.45e−02	2.32e−01	1.70e−01	4.61e−01	1.10e−01	5.16e−01	1.08e+00	6.73e−01	8.44e−02	6.68e−01	2.65e−01
	LSTM	2.50e−01	5.82e−02	2.41e−01	9.68e−02	4.82e−01	1.49e−01	4.99e−01	2.81e−01	6.88e−01	9.41e−02	6.72e−01	2.18e−01
Covid19_Illinois	TSFNN	6.93e−02	1.34e−02	6.68e−02	2.24e−02	5.53e−02	1.05e−02	5.52e−02	2.55e−02	2.34e−01	2.34e−02	2.26e−01	6.29e−02
	TT2-RVFL	7.08e−02	1.39e−02	7.29e−02	1.23e−02	5.73e−02	1.11e−02	6.09e−02	2.77e−02	2.37e−01	2.42e−02	2.40e−01	5.62e−02
	TT2-ELM	6.97e−02	1.37e−02	7.18e−02	1.25e−02	5.71e−02	1.11e−02	6.08e−02	2.77e−02	2.38e−01	2.43e−02	2.40e−01	5.62e−02
	OP-ELM	7.75e−02	1.63e−02	7.95e−02	1.30e−02	5.83e−02	1.17e−02	6.20e−02	2.79e−02	2.40e−01	2.54e−02	2.43e−01	5.60e−02
	TROP-ELM	7.74e−02	1.61e−02	7.94e−02	1.29e−02	5.83e−02	1.18e−02	6.20e−02	2.79e−02	2.40e−01	2.54e−02	2.43e−01	5.59e−02
	IT2-FNN	7.97e−02	1.63e−02	8.12e−02	1.05e−02	6.08e−02	1.23e−02	6.36e−02	2.98e−02	2.45e−01	2.61e−02	2.45e−01	6.03e−02
	eT2QFNN	8.15e−02	4.95e−02	1.53e−01	1.97e−01	2.95e−01	9.15e−01	7.23e−02	2.52e−01	4.17e−01	3.48e−01	1.67e−01	2.11e−01
	BD-ELM	8.28e−02	1.66e−02	2.02e−01	1.77e+00	5.51e−02	1.07e−02	3.31e+00	9.39e+01	2.33e−01	2.38e−02	3.53e−01	1.78e+00
	RNN-LM	7.79e−02	1.60e−02	1.89e−01	3.78e−01	5.74e−02	1.20e−02	3.74e−01	1.27e+00	2.38e−01	2.59e−02	4.01e−01	4.62e−01
	RNN-BFGS	8.55e−02	1.87e−02	9.18e−02	5.08e−02	6.09e−02	1.31e−02	7.14e−02	1.07e−01	2.45e−01	2.64e−02	2.53e−01	8.62e−02
	LSTM	9.86e−02	2.45e−02	1.07e−01	8.17e−02	6.42e−02	2.33e−02	7.77e−02	8.39e−02	2.51e−01	3.70e−02	2.63e−01	9.13e−02

Fig. 8

The data for Novel Corona Virus 2019 Dataset in Beijing, Shanghai, Tianjin and Chongqing

Similarly, the same operation is performed on the four datasets of Arizona, Washington, California and Illinois, and the results are shown in Fig. 9.

Fig. 9

The data for Novel Corona Virus 2019 Dataset in Arizona, Washington, California and Illinois

It can be regarded from Figs. 8 and 9 that, due to the characteristics of the COVID-19 dataset itself, the overall trend of the data is rising, reflecting the characteristics of its time series, and the growth rate of the curve reflects the situation of the new coronavirus. From the figures, these eight datasets are more suitable for regression problems, and the data is relatively stable. Therefore, there is little difference between TSFNN and four comparison methods in Tables 11 and 12. Compared with Figs. 8 and 9, the curve in Fig. 9 is smoother and the overall trend is more obvious. Although the curve in Fig. 8 shows an overall upward trend, the data soon go back to the stable state. Therefore, the data in Fig. 9 is more suitable for forecasting regression problems than the data in Fig. 8, which is also the reason that in Tables 11 and 12 the overall performance of the data sets corresponding to four cities in China on five algorithms is not as good as that of the data sets corresponding to four cities in the United States. Through the above analysis, the data characteristics of the data set itself lead to data differences in Tables 11 and 12. Because of this difference, four data sets in China are comparable to four data sets in the United States. On the premise that the eight data sets are small-scale time series, the data characteristics of the four data sets in China are quite different, and the performance in the regression problem is poor, while the data in the United States are more obvious and more suitable for regression analysis. In addition, four data sets such as Beijing can be regarded as data sets with more complex data structures than four data sets such as Arizona, and the features are more diverse, not limited to reflect the upward trend of data. Obviously, combined with the actual current situation. The four data sets related to the United States can better predict the future epidemic situation in the United States. As far as the data in the data set itself is concerned, the four data sets such as Arizona are more “pure” and suitable for regression problems. Therefore, the performance of the five methods on these four data sets is better, while the performance of Beijing and other four and data sets is relatively poor. According to the overall data of Tables 11 and 12, TSFNN still performs excellent and satisfactory compared with the other ten comparison methods. In four datasets of Beijing, Shanghai, Tianjin and Chongqing, although the overall data is relatively poor due to the problems with the data itself, the performance of TSFNN is still the best in these four datasets. TSFNN also performs best in Arizona, Washington, California and Illinois datasets which has the better data than 4 datasets for China. This shows in the excellent feature extraction and generalization ability of TSFNN. It is demonstrated that the scheme of tensor stacked neural network can integrate the advantages of its members and enhance the ability of data feature extraction. Results of Friedman test on ten datasets. The bold parts represent the best performance of eleven algorithms on each dataset (The testing results are listed in bracket)

Discussion of the eleven algorithms

The effectiveness and performance of the proposed TSFNN are proved via the simulation experiment in Sects. 4.1–4.3. On the whole, the comparison results on a total of 30 different data sets show that the performance of TSFNN is excellent. The parameter settings of the three comparison algorithms follow three principles: (1) The common parameters, such as the number of hidden nodes, are set as the same value. (2) The remaining parameters are the same as the settings of original literature. (3) The division of testing set and training set is the same. Based on the above principles, the performance evaluation of the model is completed. Comprehensively considering of the comparison results are listed in Tables 2, 3, 5, 6, 8, 9, 11 and 12, the performance of TSFNN is slightly better than four deep network models that are RNN-LM, RNN-BFGS, LSTM and eT2QFNN. The descending order performance rank of four deep network models is RNN-LM, eT2QFNN, RNN-BFGS and LSTM. TSFNN’s constituent members which are TT2-RVFL, TT2-ELM and TROP-ELM, which can compete with deep network models, but overall slightly worse. However, the performance of TSFNN is comparable to four deep networks. Although the four deep network models did not reach their extreme performance, their norm performance is shown at least. Therefore, the performance of TT2-RVFL, TT2-ELM and TROP-ELM is enhanced via the suggested stacked neural network scheme. The performance of TSFNN is the combined performance of its constituent members. Results of Friedman test on twenty-five data sets for eleven methods (TT2-RVFL, TT2-ELM, OP-ELM, TROP-ELM, IT2-FNN, eT2QFNN, BD-ELM, RNN-LM, RNN-BFGS, LSTM and TSFNN) are listed in Table 13. TSFNN obtains the best performance with respect to the training error and test error are all smallest. RNN-LM’s performance is second position compared with TSFNN.

Table 13

Results of Friedman test on ten datasets. The bold parts represent the best performance of eleven algorithms on each dataset (The testing results are listed in bracket)

Algorithm	Mean rank	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}χ2	p Value
TSFNN	3.22 (3.39)	254698.983 (228535.999)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$<0.05$$\end{document}<0.05 (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$<0.05$$\end{document}<0.05)
TT2-RVFL	5.95 (5.87)
TT2-ELM	5.02 (5.24)
OP-ELM	7.06 (6.79)
TROP-ELM	7.05 (6.78)
TT2-FNN	7.46 (6.78)
eT2QFNN	6.22 (7.96)
BD-ELM	7.79 (7.41)
RNN-LM	3.25 (3.43)
RNN-BFGS	6.29 (5.88)
LSTM	6.69 (6.46)

Based on performance analysis of all simulation experiments, the regression results on different complexity datasets show that the main reasons for the excellent performance of the proposed TSFNN are as below: (1) TSFNN stacks the hidden layers of its constituent members into a tensor structure and the type-2 fuzzy sets are fused into the tensor structure in this process. The stacked structure enhances the model’s feature extraction ability, uncertainty modeling ability and generalization ability. (2) TSFNN is able to inherit the advantages of its constituent members, and amplify them through superimposed structures. (3) Although the shortcomings of TSFNN’s constituent members are also inherited, these shortcomings are compensated by the model itself and other members via the suggested stacked strategy. (4) TSFNN uses the tensor regression algorithm which unfolds the 3-D tensor into three matrices through the tensor unfolded method, and the learning problem is solved via the matrix regression. (5) The regression stage uses Tikhonov regularization to constrain the model. In summary, the proposed TSFNN is an effective fuzzy system modeling and a neural network fusion method, which enriches the content of fuzzy system modeling, especially the construction of high-order fuzzy systems.

Conclusions

In this paper, a tensor-based fuzzy stacked neural network (TSFNN) model was proposed, which is a stacked neural network with several type-2 fuzzy neural network models. To the TSFNN network, the TT2-RVFL, TT2-ELM and TROP-ELM are used to form this stacked structure. In the work, TT2-RVFL and TT2-ELM are optimized by using the kernel space method to enhance their performance. Moreover, TSFNN fuses the hidden layer output networks of its member networks by tensor, while the tensor-based stacked system inherits the advantages of the type-2 fuzzy sets and the fuzzy inference ability of the fuzzy system via tensor structure. Simultaneously, the structure also inherits the merit that is generated by MRSR method and pruning method in TROP-ELM, thus, the advantages of this TROP-ELM are also extracted by tensor structure in fusing the proposed TSFNN. Because tensor structure can concentrate the advantages of member sub-networks, TSFNN obtains great generalization ability and anti-noise ability due to its stacked strategy. TSFNN also inherits the defects of its member networks. In general, the ability of TSFNN to concentrate the advantages of member networks will make up for the shortcomings of some member networks. When the data is too complex, there will still be underfitting. By and large, the proposed TSFNN algorithm has excellent generalization ability, anti-noise ability and feature extraction ability. Under effects of the five reasons or principles which are listed in Sect. 4.4, the proposed TSFNN can compete with normal deep network, such as RNN-LM, RNN-BFGS and LSTM. For the purpose of obtaining a fast model of the data set, a tensor unfolded method is used. Then, the regression results are obtained by matrix regression that is unfolded from a tensor. Tensor regression and tensor equation can also be used in this direction, which is a future optimization direction. These capabilities are demonstrated and validated on ten UCI standard datasets and three real world datasets. The TSFNN algorithm supplements the tensor-based model optimization and model combination methods, indicating that the tensor structure stacked neural network is a feasible neural networks combination method. Moreover, the proposed fuzzy network stacked strategy can be considered as an effective method for constructing higher order fuzzy systems.

13 in total

1. OP-ELM: optimally pruned extreme learning machine.

Authors: Yoan Miche; Antti Sorjamaa; Patrick Bas; Olli Simula; Christian Jutten; Amaury Lendasse
Journal: IEEE Trans Neural Netw Date: 2009-12-08

2. Long short-term memory.

Authors: S Hochreiter; J Schmidhuber
Journal: Neural Comput Date: 1997-11-15 Impact factor: 2.026

3. A Maximally Split and Relaxed ADMM for Regularized Extreme Learning Machines.

Authors: Xiaoping Lai; Jiuwen Cao; Xiaofeng Huang; Tianlei Wang; Zhiping Lin
Journal: IEEE Trans Neural Netw Learn Syst Date: 2019-08-06 Impact factor: 10.451

4. A Novel Concept Drift Detection Method for Incremental Learning in Nonstationary Environments.

Authors: Zhe Yang; Sameer Al-Dahidi; Piero Baraldi; Enrico Zio; Lorenzo Montelatici
Journal: IEEE Trans Neural Netw Learn Syst Date: 2019-03-26 Impact factor: 10.451

5. Cascaded Multi-Column RVFL+ Classifier for Single-Modal Neuroimaging-Based Diagnosis of Parkinson's Disease.

Authors: Jun Shi; Zeyu Xue; Yakang Dai; Bo Peng; Yun Dong; Qi Zhang; Yingchun Zhang
Journal: IEEE Trans Biomed Eng Date: 2018-12-24 Impact factor: 4.538

6. Biased Dropout and Crossmap Dropout: Learning towards effective Dropout regularization in convolutional neural network.

Authors: Alvin Poernomo; Dae-Ki Kang
Journal: Neural Netw Date: 2018-04-09