Compressive Strength Prediction of Self-Compacting Concrete Incorporating Silica Fume Using Artificial Intelligence Methods

This paper investigates the capability of utilizing Multivariate Adaptive Regression Splines (MARS) and Gene Expression Programing (GEP) methods to estimate the compressive strength of self-compacting concrete (SCC) incorporating Silica Fume (SF) as a supplementary cementitious materials. In this regards, a large experimental test database was assembled from several published literature, and it was applied to train and test the two models proposed in this paper using the mentioned artificial intelligence techniques. The data used in the proposed models are arranged in a format of seven input parameters including water, cement, fine aggregate, specimen age, coarse aggregate, silica fume, super-plasticizer and one output. To indicate the usefulness of the proposed techniques statistical criteria are checked out. The results testing datasets are compared to experimental results and their comparisons demonstrate that the MARS (R=0.98 and RMSE= 3.659) and GEP (R=0.83 and RMSE= 10.362) approaches have a strong potential to predict compressive strength of SCC incorporating silica fume with great precision. Performed sensitivity analysis to assign effective parameters on compressive strength indicates that age of specimen is the most effective variable in the mixture.


Introduction
Concrete as one of the important construction materials has been commonly applied around the world. A number of accessible knowledge about concrete technology have been mostly generated in the different parts of world especially in the developed. Recently, special concrete types such as self-compacting concrete and high-performance concrete are widely applied. Among various trends and developments in building industry, the introduction of self-compacting concrete (SCC) represents acceptable potential and attracted interest to exploit the alternative raw materials, wastes, byproducts and secondary materials as mineral additives. It is commonly characterized as a special concrete which has desirable fluid features such as increasing flow capability, good segregation resistance and settling by its own weight even at the existence of congested reinforcement at deep and narrow element sections of non-conventional geometry. Thus, SCC has ability of consolidating itself without using the external and internal vibration during the placing processes. Therefore, it avoids bleeding and segregation and maintains its stability at the same time [1,2].
Owing to the complicated composition which is required for SCC for accomplishing its favorable features, a suitable mix design process is crucial taking into account the available raw materials and proportioned with different chemical or mineral admixtures: an optimal balance among the fine materials, coarse and chemical admixtures is the challenge to improve distribution of grain size and particle packing, therefore, ensuring greater cohesiveness. Based on the [3], variations of mineral additives or cement because of changing the production process as well as changing the aggregate type may lead remarkably variations on characteristics of fresh SCC; in this way, it is crucial to have a robust mixture which is minimally influenced using the variability external resources. Towards that direction, the usage of powder industrial by-products and wastes as mineral additives for environmental friendly benefits in production of lightweight SCC has been paid attention of scholars as a possible way for renewable sources [4][5][6][7]. A wide variety of secondary materials have been proposed to be incorporated in the mix [8][9][10][11][12][13], containing fly ash (FAS), limestone powder (LP), ground-granulated blast-furnace slag (GGBFS), rice husk ash (RHA) and silica fume (SF), as chemical admixtures, viscosity modifying admixtures (VMA) and new generation of superplasticizers (SP).
During the last decades, usage of artificial intelligence techniques for estimating and modeling a wide range of issues especially in civil engineering because of their advantages [14][15]. The use of artificial intelligence techniques, such as artificial neural network (ANN) [16], adaptive neurofuzzy inference system (ANFIS) [17][18], genetic programming (GP) [19][20][21][22], and support vector machines (SVMs) [23], to model the compressive behavior of concrete has received significant attention. The previous studies and experiences of the researchers have indicated that in addition to different experimental research works, using the various artificial intelligence approaches in evaluating and forecasting the fresh and hardened properties of the concrete has become a importance [22][23][24]. There are a few research literatures concerning modeling of the silica fume contained self-compacting concrete. Pala et al. [25] investigated the influence of silica fume replacement content and fly ash on the strength of concrete cured for a long-term period of time using ANNs. Their investigations included concrete mixes at various ratio of water cementations materials which containing lowest and highest volumes of FA and with or without the additional small amount of SF. 24 different mixtures with 144 various samples had been gathered form the literature to achieve this purpose. Based on the results, ANNs had remarkably potential as a proper tool to evaluate the impact of cementitious material on the compressive strength of concrete. It was shown that FA content contributed little at early ages but much at later ages to the strength of concrete. Additionally, Sarıdemir [26] applied artificial neural network to model 195 specimens produced with 33 various mixture proportions including SF and MK. The used data in the multilayer feed forward neural networks models have been arranged in the form of eight input parameters which consist of the age of specimen, metakaolin (MK), silica fume (SF), cement, sand, water, aggregate and superplasticizer. He revealed that ANN as an artificial intelligence has highly potential for prediction of 1, 3, 7, 28, 56, 90 and 180 days compressive strength values of concretes including SF and MK.
The main objective of this research work is building models (i.e., MARS and GEP) which can show explicit formulas to predict compressive strength of concrete consist of SF. Also, it is notable to say that this is the first study to predict compressive strength of self-compacting concrete incorporating silica fume using MARS and GEP soft computing approaches. In this way, a reliable comprehensive database with a wide range of mixture proportions and material components have been collected from prior literatures which have been published in different Journals. In the proposed models, the inputs considered from the literature consist of parameters which effect on the compressive strength (such as water, cement, fine aggregate, specimen age, coarse aggregate, silica fume and superplasticizer). After it, the variables with significantly statistical contribution for the models estimation are determined. Additionally, sensitivity analysis (SA) was performed for determining the most effective parameters. A summary of the experimental database is provided in Section 2. Details of the predictive machine learning approaches adopted in this study are presented in Section 3. Discussions on the development processes of the proposed models, and a comparison of model predictions statistics of the proposed and the sensitivity analysis of the input variables are supplied in Section 4. Finally, Section 5 consisting a summary and conclusion of outline results is given.

Compressive Strength Data Set
The core objective of this study is developing MARS and GEP approaches to predict hardened properties of SCC containing SF. In most previous research, all applications predict one property of concrete through a large number of components. The primary goal in this model is to predict a large number of outputs from a limited number of inputs, the more we can predicted a number of properties of SCC from a limited number of its components as much as possible, the model will be successful and applicable in the field. Sufficient data are collected to build a database consisting a set of data on silica fume SCC mixtures. The data were obtained from different sources and used for training and testing the proposed models. To construct these models, a total number of 142 different experimental data was assembled from the literature [27][28][29][30][31][32][33].
The data used in the proposed models are arranged in a format of seven input parameters that cover the cement, water, fine aggregate, coarse aggregate, age of specimen, superplasticizer, and silica fume. It is clear that that the techniques derived utilizing the GEP, MARS or other similar methods, in most cases, have a predictive possibility within the data range used for their development. The amount of data used for the training process of the GEP and MARS methods bears heavily on the reliability of the final models. The majority of previous works construct a database from their experimental results, so the results are limited just for their environment, but our database is built from many different sources of data including the literature in different countries; moreover, it can be applied in a wider area. The boundary values for input and output variables used in the GEP and MARS models are listed in Table 1. Moreover, the input parameters are distributed in different ranges in a homogeneous form for training the model as shown in Table 2.

Predictive Machine Learning Approaches
In present study, the two heuristic machine learning methods, GEP and MARS, which are applied in prediction of compressive strength of SCC, are briefly described as follows.

Gene Expression Programing
Recently GEP as a new method of artificial intelligence techniques was developed which is extended from GP approach. The GEP is a searching model which evolves computer programs in forms of decision trees, mathematical expressions, and logical expressions [34][35][36]. Furthermore, GEP technique has attracted the attention of researches in characterizations prediction in civil engineering problems. In this study, GEP model based formulation has been applied for predicting the compressive strength of self-compacting concrete (SCC) incorporating Silica Fume (SF). The GEP model is coded as linear chromosomes that are expressed to Expression Trees (ETs).
It is a fact that ETs are complicated computer programming which are usually evolved for solving a practical issue, and are considered on the basis of their fitness at solution of that issue. The corresponding mathematical expressions can be extracted from these tree structures. the ETs population will discover traits. Thus, they will adapt to the particular problem which they are recruited in order to solve [34, 35, and 37].
GEP development contains five steps. At First, fitness function, fi, of an individual program (i) is determined as follows: In which , ( , ) , and are the selection range, value given by the individual chromosome i for fitness case j, the largest value for fitness case j.
After that, the set of terminals T and function F were determined so as to generate the chromosomes. In this study, the terminal as seven independent parameters have been shown as ( ) = { , , , , , , } In order to find the proper function set, it is important to peer review previous evaluation of compressive strength. In this regards, basic mathematical functions (√, power, exp) and four basic operators (+, -, *, /) have been used to forecast the compressive strength. In the third step, chromosomal architecture is configured. Selection of liking function is stood in the fourth step. At the final step, the genetic operators which case variation and their rate is selected.

Multivariate Adaptive Regression Splines
Multivariate adaptive regression spline (MARS) is a non-linear and non-parametric regression method that presented by Friedman [38]. It is constructed by non-linear responses between a system input and output using a set of splines (piecewise polynomials) with different gradients. There is no need a permanent assumption about basic functional relationship between input and output variables. Endpoints of the segments are called nodes. A node defines endpoint of an area of data and beginning of another area of data. Resulted splines (known as base functions) provide more flexibility for the model and consider curvatures, thresholds and other deviations of linear functions (Friedman 1991). MARS method creates basis functions (BFs) by step searching. Adaptive regression algorithm is used to select nodes position. MARS models are created via a two-step method. In first step, functions are added up and probabilistic nodes are found for performance improvement led to a model with a perfect curve fitting (primary phase). Second step involves removal of minimum real terms (secondary phase). In this method, an open source code from Jacobson's is applied to conduct the analysis presented in this paper [39].
Suppose y is a deterministic output and X = (X1, ... , Xp) is input variable matrix, P. Thus, it is considered that data are obtained from an unknown "real" model. Consequently, the response is as follows: Where, e is error distribution. MARS is used to approximate function f by employing basis functions (BFs). Basis functions are referred to splines (smooth polynomials) comprising piecewise-linear functions and piecewise-cubic functions. In this study, piecewise-linear functions are employed, thus these functions are explained in the following.
Piecewise-linear functions are a type of max (0, x-t), where a node is located on t value. max(.) denotes that only positive part of (.) is used; otherwise, it is zero.
MARS Model is a linear combination of BFs and their mutual relations which is expressed as follows: Where, λm is smoothing parameter. Each λm(x) is a basis function which might comprise one spline function or product of two or more spline functions (data might impose using higher degrees; here maximum a second degree is considered). Coefficients β are constant and can be estimated using least squares method. MARS modeling stems from data. First, primary method is applied to training data for fitting model of (4). This method which is created in width of β0 and basis couple, results in maximum reduction of training error. Next model is added to the model, based on present model of basic function M: Where least squares method is used to estimate. Mutual effects between BFs which are present in the model are also considered, since the basis function is added to model space. Then BFs are added to the model to obtain the maximum number of terms which results in a perfect fitness model. Then a secondary removal discipline is employed to reduce number of terms. This removal method is applied to find a model which is closest to optimal range by eliminating extraneous variables. In this method, BFs with minimum contribution to the model are eliminated to find the best submodel. Therefore, BFs selected from set of all BFs which were used in primary selection step, comprise the final optimized model. Generalized cross validation (GCV) method is used to compare subsets of the model due to its low computational cost. The test equation which is an adaptive amount is used to approximate high dimensional BFs for decreasing perfect fitness probability. N observations are used to calculate GCV of the training data model [40].
Where M is the number of BFs, N is the number of observations, d is the estimation parameter, and f (xi) represents values predicted for MARS model. Average errors of evaluated training data model's squares are the numerator which is estimated as a fraction. The numerator increases complexity of the model by assuming an ascending variance. It is worth mentioning that (M-1)/2 is the number of nodes of the basis function. GCV not only estimates the number of BFs of a model but also it estimates the number of nodes [41,42]. In order to minimize (4), one BF is eliminated in each removal step such that the presented model is fitted sufficiently. MARS is an adaptive technique, since BFs and positions of variable node are selected by data-driving and are specific for each problem.

Evaluation Metrics
Models for predicting the expansion strain of SCC should be evaluated in a proper way. In this study, the models constructed according to the GEP and MARS were statistically measured with the following index [43]: (1) Coefficient of determination (R 2 ) (2) Root mean squared error (RMSE) (3) Mean absolute percentage deviation (MAPD) Where O is the measured value of compressive strength, P is the predicted value of compressive strength, and N is the number of dataset sample.

Results and Discussion
In this section, the reliability, the effectiveness and the robustness of the proposed models, for the finding of the optimum solution, are compared each other using statistical metrics.

GEP Development
In this study, compressive strength characterizations are estimated by applying the GEP model. Moreover, the functional set and the operational parameters recruited in the GEP models are reported in Table 3. To predict the compressive strength, the best possible individual in each generation were 30 chromosomes. The best ETs of compressive strength predicted which returned by GEP model is represented by Figure 1.

Figure 1. GEP model in CS prediction
In addition, the relationship between input and output variables which returned by ETs of GEP is expressed as.

MARS development
An open source code of MARS (ARESLab) from Jekabsons [39] which develops the main functionality of the MARS model for regression proposed in (Friedman, 1991), is applied to perform the analysis described in this study. Table 4 illustrated that MARS analytical details containing numbers of interactions in the final model, basis functions (BFs), GCV value and so on. This study also used a 10-fold cross-validation technique for avoiding the model performance assessment bias. To identify crucial variables and interactions among the variables in high-dimensional models, decomposition of Analysis of variance (ANOVA) that is a well-known statistics method has been carried out for the model by help of the training dataset.
The rejection or admission of proposed techniques was investigated using their ability for CS estimating. In order to test the model accuracy, a comparative study had been carried out in terms of R 2 , RMSE, and MAPD benchmarks.  As clearly seen in Figure 2 and 3, CS predicted values which provided by Equation 12 were remarkably closer to perfect line in comparison with that of GEP model. Additionally, the most of CS estimated using MARS and GEP models have a relative error below and over 20%, respectively. It means that the Equation 12 returned by MARS provides the permissible prediction. The comparison values of CS observed versus predicted ones for MARS and GEP is presented by Figure 4. Based on Figure 4, it is obvious that MARS model had better performance for local maximum and minimum of data point of CS forecasting in comparison GEP model. Predicted and observed differences of CS related to GEP model also indicated that this model was not as a suitable tool for CS estimating.  Table 6, According to

Sensitivity Analysis
To determine input variable which has highly influence on the output, MARS approach was selected to perform a sensitivity analysis. The analysis was conducted such that, one parameter of effective variables on prediction of CS was removed each time to investigate the effect of that input on output. The sensitivity value (%) of the dependent variable to each independent variable is computed using Equation 13 and 14 as: Where ( ) is the maximum of the estimated output and fmin (xi ) is the minimum of the predicted output over the ith input domain, where other variables are equal to their mean values. The result of sensitivity analysis via the proposed MARS is represented in Figure 5. Results of the analysis demonstrated that AS is the most effective parameter on the compressive strength of SCC containing silica fume and W has the least influence on the CS. The other effective parameters on the CS according to their rank can be seen in Figure 5.

Conclusion
This study evaluated the feasibility of utilizing GEP and MARS models to estimate 28 days compressive strength of SCC containing metakaolin. Proposed approaches were developed using 117 data sets consist of mixture proportions specification. Compressive strength at 28 days was considered as output while C, C.A, F.A, MK, W and B were selected as inputs. MARS method by providing 12 basic functions could predict CS28. Whereas, GEP model could build 3 ETs in order to estimate it. The compressive strength values estimated at training and testing stages by multivariate adaptive regression splines were equal to (R=0.989, RMSE=3.659, MAPD=4.657) and (R=0.939, RMSE=6.327, MAPD=7.437) respectively, indicated high ability of MARS model compared to results given by GEP approach at training (R=0.911, RMSE=10.362, MAPD=13.981) and testing and (R=0.90, RMSE=7.982, MAPD=8.635) respectively. Finally, sensitivity analysis was conducted for evaluating the effect of input variables on CS28 forecasting. Regarding to the results of sensitivity analysis, AS variable was selected as important input for prediction of CS28. The work presented in this paper demonstrates the ability of the soft-computing methods to predict the behavior of SCC, which provides the designers and researchers with an alternative technique to conventional methods.