Particle Swarm Optimization Based Approach for Estimation of Costs and Duration of Construction Projects

Cost and duration estimation is essential for the success of construction projects. The importance of decision making in cost and duration estimation for building design processes points to a need for an estimation tool for both designers and project managers. Particle swarm optimization (PSO), as the tools of soft computing techniques, offer significant potential in this field. This study presents the proposal of an approach to the estimation of construction costs and duration of construction projects, which is based on PSO approach. The general applicability of PSO in the formulated problem with cost and duration estimation is examined. A series of 60 projects collected from constructed government projects were utilized to build the proposed models. Eight input parameters, such as volume of bricks, the volume of concrete, footing type, elevators number, total floors area, area of the ground floor, floors number, and security status are used in building the proposed model. The results displayed that the PSO models can be an alternative approach to evaluate the cost and-or duration of construction projects. The developed model provides high prediction accuracy, with a low mean (0.97 and 0.99) and CoV (10.87% and 4.94%) values. A comparison of the models’ results indicated that predicting with PSO was importantly more precise.


Introduction
The cost and duration prediction is considered an essential issue in construction projects. Underestimation and overestimation of costs may result from the failure of a construction project. The utilize of various approaches in the entire project lifetime should supply information on costs to the contributors to the project and support a complicated decision-making process [1,2]. Cost and duration evaluating is a vital task for costing and tender preparation for any construction project before they are built. Cost and duration estimation in the early steps of construction projects comprises a considerable doubt. Hence, there is a high request to construct an active approach to minimize uncertainty in cost and duration prediction. One traditional technique for predicting the cost and duration values is the utilize of several specialists. Nevertheless, continuous contact with these specialists is not always an easy choice, producing to improve the alternative method to predict the cost and duration of construction projects. It is preferred to construct the new method according to datasets created from the preceding similar projects. Furthermore, utilizing the traditional technique is difficult and complicated. Hence, utilizing a soft-computing method is a noticeably, more effective way to address nonlinear problems. The best solutions for any system can be defined as the viable solutions with fitness values, 385 as well as the values of any other sustainable solutions; these solutions are achieved by selecting values for the set of parameters that satisfy all constraint solutions [3]. Furthermore, optimization approaches are utilized widely in numerous areas, such as engineering and computer science. Study in the optimization area is very active, and new optimization approaches are being developed frequently [4]. The main goal of optimization methods is to find values for a set of parameters that maximize or minimize objective functions that are subject to certain constraints [5]. Through the last decades, numerous researches have been achieved to develop optimization approaches that apply evolutionary programming methods.
A relationship between completed construction cost and the time taken to complete a construction project was first mathematically established by Bromilow (1974) [6]. A regression analysis was utilized by Carr (1979) [7] to organize the duration and cost preparation of industrial buildings. Based on the neural network technique, Wang et al. (2013) [8] proposed a cost estimator model. The learning steps of their neural network were accomplished using a particle swarm optimization (PSO) method. In 2014, a hybrid model PSO-BPNN was proposed by Hong et al. (2014) [9] to assess the cost of construction projects. The PSO technique in the network has optimized the ANN weights. In 2015, Zima (2015) [10] presented a CBR model to predict the construction elements unit price. The CBR method shows a knowledge base that supports the cost prediction at the initial step of a construction project. A hybrid model ANN-ACO and ANN model for determining the amount and cost of construction waste in the early stage of construction were developed by Lee et al. (2016) [11] using "ant colony optimization" ACO algorithm to optimize the ANN weights and ANN model In 2018, a proposed model for predicting the construction costs of sports areas was presented by Juszczyk et al. (2018) [12]. Hybrid DES-PSO model that includes discrete event simulation (DES) and particle swarm optimization (PSO) algorithms were developed by Hegazy et al. (1994) [13] construction through a set of iterations in networks utilized, that significantly reduces efforts in search optimization scenarios.
The main target of the current research is to propose and investigate models of cost and duration estimating for the construction projects in the initial planning stage using particle swarm optimization (PSO) technique. The proposed PSO models can assist the engineers in making informed decisions in the initial stages of the design steps. With these models, it is probable to acquire a precise estimation, even when suitable information is not obtainable in the initial phases. These approaches encourage a feedback procedure that may support designers to attain the best solution. Moreover, the proposed models considered some category parameters, such as the security status that has been happened in Iraq in the last decade.

Research Methodology
Soft-computing methods are utilized to overcome complicated numerical optimization problems as non-linear systems. The current study tries to propose PSO models for predicting the cost and duration of construction projects accurately. The primary purpose of this study is to adopt and propose new models for the duration/cost assessment of construction projects utilizing the PSO algorithm. The proposed models were developed according to numerous, affecting input parameters, as presented below. The definition of the input parameters is listed in Table 1.

B
The brick volume.

EN
The number of elevators in the buildings.

AGF
The area of the ground floor.

TFA
The total area of floors.

FN
The floors number.

SS
The security status: 1-Safe, 2-Moderate and 3-Not safe Optimization is required to produce optimal cost and-or duration values for a construction project. Of three key points must be taken in its progress: (a) The objective function must be formulated.
(b) A clear approach is required to solve the optimization problem.
(c) The convergence criterion should be specified.
These detailed items will be discussed in the subsequent sections.

Objective Function of PSO Models
The primary objective of PSO is to optimize the cost and-or duration values and exploration for an optimum set of unknown coefficients, as illustrated in the proposed model section from within the solution space. The actual and forecast values of the duration and cost amount were detected to have minimal differences when using the final form of the optimized model. The proposed models are simulated utilizing MATLAB to optimize the duration and cost amount model for the construction projects. The objective function used in this study is the root mean square error (RMSE). This objective function can be accounted for utilizing the following expression [14][15][16][17]: Where ' refers to the forecasted value, y refers to the actual value, and n denotes to the number of dataset samples.

Optimization Method of PSO Models
As a result of its global convergence ability, easy implementation, and adoption, PSO is considered one of the best optimization approaches. PSO is an evolutionary computation approach developed by Eberhart et al. (2001) [18], which was inspired by the social behavior of bird (particle) flocking. The PSO algorithm is generally accepted and used in solving different optimization problems. During the entire search process, the position and velocity of each particle can be updated according to Equations 3 and 4.
Where and are the velocity and position of the particles, respectively; (•) 1 and (•) 2 are random numbers that are uniformly distributed between 0 and 1; pbest denotes the best position of each particle in space, and gbest represents the globally best position of all the particles. Acceleration coefficients 1 and 2 describe the 'trust' settings that mention the degree of confidence in the optimal solution found by an individual particle ( 1 -cognitive parameter) and by the whole swarm ( 2 -social parameter). The term w in Equation 3 refers to the inertial weight that was presented to improve the convergence of the iteration procedure. This weight is a scaling factor utilized to control the search capabilities of the swarm, which scales the current velocity value that affects the updated velocity vector. Later, Shi and Eberhart (1998) [19] developed the original PSO algorithm by adding the inertial weight; thus, this weight was not a portion of the original one. Figure 1 displays the updated particle position and velocity of the 2D parameter space. The first vector refers to the momentum velocity of the particles in the previous stage. The second vector refers to the particle memory components that get the best position as a result of iteration. This speed component attracted the particle to the best position in the solution space. So, the last vector is called a social component or swarm. The particle in this component is attracted to the best position in the swarm [20].

Convergence Criteria
Convergence criteria must be applied to end the process of optimization during the repeated search [21,22]. The maximum number of iterations and minimum error requirements are the convergence criteria adopted in the PSO algorithm. The complexity of the optimization problem determines the maximum number of iterations. Previous knowledge of the optimal global error value determines the minimum error of the algorithm, which is possible to test or adjust the algorithm in mathematical problems when optimization is known a priori. Table 2 lists the main PSO parameters. Table 3 illustrate the convergence parameters of the PSO utilized in the current study [23].

Number of particles, NP
A typical range is 10-40. For some difficult or special problems, the number can be increased to 50-100 The dimension of particles, n It is determined by the problem to be optimized.

Inertia weight, w
Usually is set to a value less than 1, and for faster convergence, w=0.7 is considered.
Vectors containing the lower and upper bounds of the n design variables, respectively, x L , x U They are determined by the problem to be optimized. Different ranges for different dimensions of particles can be applied in general Cognitive and social parameters, 1 and 2 Usually 1 = 2 = 1.494. Other values can also be used, provided that 0 < 1 + 2 < 4.

Maximum number of iterations (t max) for the termination criterion
Determined by the complexity of the problem to be optimized, in conjunction with other PSO parameters (n, NP) Number of iterations (kf) for which the relative improvement of the objective function satisfies the convergence check If the relative improvement of the objective function over the last kf iterations (including the current iteration) is less or equal to fm, convergence has been achieved Minimum relative improvement (fm) of the value of the objective function

. Flowchart of PSO of the proposed model
The optimization procedure typically uses a gradient-based algorithm appropriate for local exploration. Consequently, to be successful, the optimization procedure needs an initial point gotten from a global exploration. A strong training process requires both the initialization and optimization procedures. The following highlights how the PSO algorithm can be implemented to search for the optimum duration and cost amount of the construction projects.
 Create a swarm initialization by assigning a random location for each particle in the hyperspace problem.
 Evaluate the objective function of the proposed model for each particle.
 Compare the objective function value of each particle with pbest. If the current value is better than the pbest value, this value is set as pbest, and the position of the current particle, Xi, is set to pbest.
 Identify particles with the best objective function value. The value of its target function is determined to be gbest, and its location is gbest.
 Update the velocity and the position of all particles based on Equations 3 and 4.
 Repeat steps 2-5 until the convergence criteria are met (the maximum number of iterations or a sufficient objective function value is obtained).
The proposed model was formulated utilizing MATLAB software to optimize the cost amount and duration models of construction projects. The proposed models to be optimized are as follows: Where F1 to F9 and K1 to K9 are the unknown coefficients.
The main goal of utilizing PSO to optimize the cost amount and duration models is to examine for an optimum set of unknown coefficients. Hence, the difference between the actual cost amount of construction projects and that predicted utilizing the final form of the optimized expressions is minimal.

Description of Dataset
A total of 60 construction projects constructed by government contractors between 2008 and 2016 from different places in Iraq were collected. The selected projects (samples) represent about 80% of the projects implemented in Iraq in terms of implementation method, materials used, and architectural style. Eight input variables ( , , , , , , , ) and two output variables (cost amount or duration), as displayed in Table 4.
Models inferred using optimization tools have the capability to estimate within the data range obtainable and are applied for additional development. Thus, the size of the dataset utilized for the modeling process is essential, as it impacts the accuracy of the final models. The behavior of any model modified using this data is influenced by the sample size and its variable distributions. Therefore, the data is graphically illustrated in Figure 3 as histograms. Figure 3 depicts the statistics of the samples utilized in constructing the proposed model.
For high accuracy, the ratio of the number of dataset records to the number of input parameters should not be less than three, as proposed by Frank and Todeschini (1994) [24], and they recommended to be higher than five. For the present case study, this ratio was 60/8 = 7.5, which exceed the recommended criteria. From the 60 samples (projects), 48 samples (80 %) were considered for building the proposed models, while 12 samples (20 %) were utilized to validate the proposed models. The descriptive statistics of the dataset utilized in this study are given in Table 5.

Results and Discussion
The PSO technique was utilized to optimize the construction projects cost and-or duration amount. Models have been proposed to examine the influences of swarm size on the outcomes. The main job of the objective function in a PSO approach is to reduce the difference between the predicted and actual cost and/or duration amount. PSO offers models that can assess the cost and-or duration and finding results as close as possible to the measured results. The PSO technique updates its process until either a proper global best (gbest) or the maximum epochs (iterations) is achieved, as presented in the methodology. Table 6 shows the parameters used in the PSO model. Statistical methods, namely: the coefficient of variation (CoV), correlation coefficient (R), and Bland-Altman (2007) [25] analysis, were used in this study to evaluate and examine the ability of the proposed models. The root mean square error (RMSE) was used as an objective function to choose unknown coefficients. Additionally, five swarm sizes (10,20,30,40, and 50) were used and evaluated. In this study, the iterations number fixed to 1000 because of the differences in the objective functions are stabilized after 700 iterations, as shown in Figure 4 for both cost and duration models. Numerous swarm sizes were tested to evaluate which swarms could minimize the error.  As previously determined, Bland-Altman analysis predicts the level of difference. Monitoring scattered values can help to find agreement between actual and predicted values. As shown in Figure 5, a reasonable agreement between the test methods was presented. This figure shows that the data is distributed within the limits of the agreement, indicating the appropriate accuracy of the proposed models. Figure 4 illustrations that 50 swarms provided a better solution for the PSO because, they accomplished the minimum objective functions with 95.83% and 97.92% accuracy, for cost and duration, respectively. With respect to the other swarm sizes, 10 swarms produced significant errors. The results show that the 50 swarms displayed a higher accuracy for the actual values for both models, namely: cost and duration. Diffrence Mean According to the CoV and R values, the proposed models accomplish minimum error, as presented in Table 7. The best solution for the PSO algorithm because it accomplishes a minimum coefficient of variation, CoV, and maximum value of correlation coefficient (R), as presented in Table 7. Smith (1986) [26] recommended a rational hypothesis to judge the performance of the model by the following criteria:  If a model gives | | > 0.8, a strong correlation occurs between the forecast and actual values;  If a model gives 0.2 < | | < 0.8, a good correlation occurs between the forecast and actual values;  If a model gives | | < 0.2, a weak correlation occurs between the forecast and actual values. Figure 6 displays that the proposed PSO models had an adequate R-values (0.9441 and 0.9940 for cost and duration) and assessed the target values with adequate accuracy. Moreover, the coefficients ( 1 9 ) and ( 1 9 ) obtained from the optimization results will be substituted in Eqs. 5 and 6 of the proposed models, as presented in the following final expressions.

The Validity of the Proposed Models
A dataset comprising of 12 construction projects (20% of the total dataset) was utilized to examine and validate the proposed models. These samples were not utilized in the construction stage of the proposed models. Table 8 shows that the cost and duration assessed by the proposed models are reliable and consistent based on the results. The results recorded values of the mean close to 1.0 (0.97 and 0.99 for cost and duration); this reflected the accuracy of the proposed model, as presented in Table 8.
Pimentel-Gomes (2000) [27] specified that the value of a CoV reflects the accuracy of the relationship between the inputs and the output, where CoV values of less than 10%, 20-30%, and above 30% mean high accuracy, low accuracy, and low precision, respectively. For the proposed model, the COVs for cost and duration models were 10.86% and 4.93%, representing high accuracy. Moreover, the R-values of 0.9914 and 0.9940 (as presented in Table 8) reflect a good agreement between the actual and forecast cost and duration values. It can be stated based on these results that the proposed models efficiently assess the cost and duration of the construction projects.  [28] were checked for the external verification of the proposed models on the testing datasets. It seems that at least one slope of regression lines ( or ') through the origin should be close to 1.0. Roy and Roy (2008) [29] introduced a confirmative indicator of the external predictability of models ( ). For > 0.5, the condition is satisfied. The squared correlation coefficient (through the origin) between predicted and experimental values ( 2 ) should be close to 1.
The considered validation criteria and the relevant results obtained by the model are presented in Table 9. As can be seen, the proposed models satisfy the required conditions. The external validation criteria result for the models are presented in Table 9.
The cost and duration values estimation achieved by the proposed models are illustrated in Figures 7 and 8. The models have acceptable estimation accuracy when the ratio of the actual to estimated values is close to one. As can be presented from Figure 7, the ratio distribution of the actual to estimate values for the proposed PSO model in duration have better estimation accuracy than the PSO model in cost.
For further statistical analysis for the mentioned models, a comparison between the actual and assessed cost and duration values has been illustrated in Figure 8. This Figure shows that the proposed PSO model in the duration is closer to the actual duration of projects than the proposed PSO model in cost.

Screening and Parametric Analyses
After constructing the proposed model, various phases were considered: (i) deriving the final models based on collected datasets; (ii) computing several external validation criteria to verify the models; and (iii) conducting a parametric study based on engineering principles and the physics of the problem. The first two steps are purely statistical; Predicted-Duration Actual-Duration however, the third step is based on engineering principles and should be performed by an engineer who understands the problem being modeled. The first and second steps were achieved here for this type of problem. Therefore, for further verification of the developed model of cost and duration value, a parametric analysis was performed.
This study primarily seeks to assess the effect of individual parameters on cost and duration values. Figure 9 demonstrations the forecast values of the cos and duration accomplished by the proposed models as a function of each parameter. Figure 9 (a) and (b) show the proposed models as a function of the ( , , , , , , , ) parameters. Figure 9 (a) and (b) display that increases in the amounts of , , , , , , , up to a certain level lead to increases in the cost and duration values, indicating that the proposed models can be utilized as a guide to choose the suitable parameters correctly. Moreover, Figure 9 displays that the parameters SS, C, and GFA are the most effect parameters on the cost and duration values.

Conclusions and Recommendations
The main objective of this study was to develop mathematical models that will be applied to forecast the cost and duration of the construction projects. In this study, sixty construction projects were utilized to build the proposed models at early-stage design. The main conclusions are drawn according to the models' outcomes, as follows:  The contractors can utilize the proposed model to assess the construction cost and/or duration, and compare them with that specified by the client at the bid phase, to know if the cost and/or duration will be reasonable for the given project and its budget. This modeling technique based on historical datasets collected from existing projects. Thus, it is more practical, consistent, and reliable than currently utilized subjective methods based on intuitive assessments by designers.
 The statistical analysis demonstrations that the CoV, mean, and R display good accuracy and reliability for the predicted values. With low mean (0.97 and 0.99) and CoV (10.87% and 4.94%) values, the proposed PSO models (for both cost and duration) provide a proper assessment of the construction projects. Hence, this model can be utilized as a design indicator of cost and duration estimations at the early-stage design.
 The outcomes display that the PSO technique is proper for evaluating project management problems and can be utilized as a useful tool to search the optimal solutions with differs parameters.  The proposed model supplies a guide for choosing the suitable parameters that influence the cost and duration parameters, such as security status, total area, area of the ground floor, floors number, the brick and concrete volume, and elevators number.
In this study, the dataset for only sixty construction projects was utilized to build the model. Nevertheless, more case studies with similar kinds of projects will supply more consistent results.
 Further construction projects should be conducted to examine and modify the proposed model and to investigate a wide range of parameters.
 Future research could be considered to build a model for cost and/or duration estimation for green buildings.

Conflicts of Interest
The authors declare no conflict of interest.