Predicting Project Success in Residential Building Projects (RBPs) using Artificial Neural Networks (ANNs)

Due to the urban population’s growth and increasing demand for the renewal of old houses, the successful completion of Residential Building Projects (RBPs) has great socioeconomic importance. This study aims to propose a framework to predict the success of RBPs in the construction phase. Therefore, a 3-step method was applied: (1) Identifying and ranking Critical Success Factors (CSFs) involving in RBPs using the Delphi method, (2) Identifying and selecting success criteria and defining the Project Success Index (PSI), and (3) Developing an ANN model to predict the success of RBPs according to the status of CSFs during the construction phase. The model was trained and tested using the data extracted from 121 RBPs in Tehran. The main findings of this study were a prioritized list of most influential success criteria and an efficient ANN model as a Decision Support System (DSS) in RBPs to monitor the projects in advance and take necessary corrective actions. Compared with previous studies on the success assessment of projects, this study is more focused on providing an applicable method for predicting the success of RBPs.


Introduction
The growing trend in urbanization has caused a considerable increase in residential buildings' construction in urbanized districts. Besides, essential demand for the renewal of old and dilapidated housing has caused residential building projects (RBPs) to gain significant share in the construction market, particularly in developing countries. Therefore, successful completion of RBPs leads to meet this increasing demand for new housing in terms of quantity and quality.
Several studies have been conducted on the success of projects. Most of them deal with defining the term success, indicating critical success factors (CSFs) or success criteria. The variety of industrial or construction projects have been addressed in these surveys. Thus, most of the assessed factors have been generic factors. The studies focusing on the prediction in projects have been mostly involved one or two parameters and not the overall success of the projects based on selected success criteria.
This study firstly aimed to taper the wide scope of construction projects into the focal point of building projects and, moreover, consider the residential buildings. Secondly, unlike previous studies, it has considered the factors and criteria comprehensively in project success. Finally, the artificial neural network (ANN), as a reliable method in solving nonlinear regression problems, has been implemented in this study to predict the success of RBPs.
private partnership (PPP) housing projects, and so on [40,43]. Therefore, to overcome housing deficits in terms of quantity and quality, RBPs need to be successfully completed.

Artificial Neural Network (ANN)
ANN focuses on computing and storing information in a structure consisting of neurons. It simulates the behavior of the human brain and nervous system, and can learn, store, and generalize the patterns. Thus, ANN can be useful for solving multivariate and pattern recognition problems and also the problems [44,45].
In statistical regression models, the dependent variable is calculated through a mathematical equation using the given input features of samples. The number of input features does not typically exceed 2 or 3. In contrast, the ANN model can learn a database, including hundreds of input features and corresponding dependent variables or targets. After using several samples to train an ANN model, this machine learning model may predict the target of a new sample having its input features [46,47].
ANN has been increasingly applied recently in prediction models due to its capabilities of modeling complicated functions with numerous factors and also learning the pattern of samples, parallel processing, rapid responding, handling errors and noise data, better classification, and better performance in prediction in comparison to traditional statistical methods [48,49].
Several researchers have implemented ANN to predict some parameters in construction projects such as budget performance [50], cash flow diagram [51,52], cost overrun [53], engineering performance [54], cost of per square meter [55], and cost and time performance [56].
The rest of the paper is structured as follows: In section 2, the methodology of the model development is described through three steps. In section 3, the findings of each step are represented, and the interpretation of results is discussed. In section 4, the conclusion of this study, its limitations, and possible future works are provided.

Materials and Methods
A model is proposed in this study to predict the success index of residential building projects through a 3-step framework. The steps are as follows: Step 1-Determining critical success factors (CSFs) in residential building projects (RBPs); Step 2-Determining success criteria (SC) in RBPs and defining the project success index (PSI); Step 3-Structuring, training, and testing the proposed ANN model.

Step 1: Determining Critical Success Factors (CSFs) in Residential Building Projects (RBPs)
In the first step, studies on critical success factors in the last three decades were reviewed. All the factors, including the generic factors related to all types of industries and specific factors in construction or building projects, were extracted. The factors which were not related to RBPs were excluded. All the extracted factors were evaluated using the Delphi method, and the essential CSFs were selected ( Figure 1). In the Delphi method, a panel of experts reaches a consensus on a particular issue through a systematic and repeatable method of distributing the questionnaire and analyzing the collected data. The experts can be geographically scattered, and the method structures communication among participants for solving even complex or exploratory problems [57,58].
In this study, 12 experts, including project managers and site managers in executive organizations and academics, with more than 15 years of experience in construction and building projects, were selected to judge the CSFs. The initial list of CSFs extracted from the literature were included in the questionnaire. In the first round of the Delphi method, respondents were asked to judge whether the extracted factors should be ignored, eliminated, or modified. Furthermore, they were asked to add relevant significant factors based on their opinion, which had been ignored in the literature. Then, the CSFs were categorized and reorganized, and the responses were synthesized.
In the second round of the Delphi method, the respondents were asked again to evaluate the synthesized responses to reconsider their own answers, knowing the answers of other experts. They determined the importance of each factor based on the 5-point Likert scale scoring, where the scores vary from "1," indicating "not important at all' to "5," indicating "extremely important." The convergent responses were reached at the end of the second round. The Delphi method flowchart used in this study is depicted in Figure 2.
Finally, the categorized list of most effective CSFs was derived based on expert judgment. These factors with higher average scores were used later as nodes in the input layer of the ANN model. Similar to step 1, in step 2, a review was carried out on literature on success criteria. An initial list of criteria, including all types of industries, was developed. The criteria which were not related to RBPs were removed from the list.
Similar to the previous step, a questionnaire was prepared to collect the experts' judgment on criteria. Using the Delphi method, these criteria were evaluated and scored based on the 5-point Likert scale. The most important criteria with average scores higher than 3.5 were selected as nodes in the output layer of the ANN model. Figure 3 represents inputs, processes, and outputs of step 2.
After selecting the criteria through the Delphi method, an index was required for denoting the level of project success. This index quantifies the abstract term of success, which helps to compare the project outcomes or performance. Since the perception of success is subjective, the weights of criteria vary in different projects or organizations. A general scoring equation can be proposed as Equation 1: where PSIi stands for project success index of project i. Wj is the normalized weight of criterion Cj, cj is the value derived from the node in the output layer of the ANN model related to criterion Cj denoting the level of success of project i considering the criterion Cj. n is the number of criteria selected by experts as determinative in evaluating the success of RBPs. The weight of each criterion shows its degree of significance. Therefore, it varies based on decision-makers' opinions or the project circumstances. For instance, "quality" may be much more critical for an organization than the "cost" criterion, and its normalized weight may be higher in the above equation. To determine the weights, several approaches can be applied, such as (1) simple decision-making meeting in small-size RBPs, (2) distributing questionnaires among decision-makers, scoring the criteria using the Likert scale and calculating the average score for each criterion, and (3) multi-criteria decision making approaches such as AHP or ANP to collect and synthesize experts judgments.
After determining the criteria weights, to compare an RBP with another RBP, the weights may not be changed or recalculated. The value of PSI is between 1, indicating not successful at all, to 5, indicating absolutely successful. According to the predicted PSI derived from the proposed framework in this study, project decision-makers may have an overall foresight of the project's success and decide to take corrective action to promote PSI. In this study, as a machine learning method, ANN was used to predict the PSI of RBPs. A multi-layer perceptron (MLP) was implemented, which is a class of feed-forward ANN and is appropriate for prediction or approximation problems. A schematic diagram of the MLP network with one hidden layer is presented in Figure 4.
As illustrated in Figure 4, the K-dimensional input vector X = [X1, X2, ..., XK] is a vector from the input dataset and introduces K features as influencing parameters to the network. The dataset was collected from 121 completed RBPs. Therefore, the dataset included 121 sample vectors. Each vector indicated the data included the situation of selected CSFs during the construction phase and the outcomes of a particular sample project based on the selected criteria at the end of the project. These data were used to train, validate, and test the network. According to step 1, the factors influencing the project success were assessed, and 16 CSFs were selected, and thus, K equals 16. For each sample vector from the dataset, input values Xk, k=1,2,...,K, were normalized using Equation 2: Where ̅ k is the normalized value of the input raw feature Xk in the sample. Xmin and Xmax are the lowest and highest values of the raw feature Xk in all samples, respectively. This equation maps all values for each feature between 0 and 1. One of the most significant parameters of an ANN is the number of neurons in the hidden layer. To find the optimum network, the number of neurons in the hidden layer increased from 5 to 20, and the network's performance was assessed using the network error. In Figure 4, the parameter M represents the number of neurons in the hidden layer.  Normalized inputs ̅ k to neuron m of the hidden layer were firstly multiplied by corresponding weights wkm with the constant bias value θm and summed up using Eq. (3). The parameter nm is the argument of activation function g. The sigmoid function, commonly used in prediction problems, was used as an activation function in this study (Equation 4).

Inputs
The above operations were conducted in all the M neurons of the hidden layer, and then, their values were used as inputs to the output layer. The same procedure was applied to each of the P neurons of the output layer, and finally, P values of Yp are obtained as predicted values of the model.
The back-propagation method was used for supervised learning of ANN. The Levenberg-Marquardt algorithm was used in the back-propagation method to minimize errors. The model's performance is assessed based on two parameters, including mean square error (MSE) and the coefficient of determination (R 2 ) using Equations 5 and 6, respectively. The lower MSE and higher R 2 , the better performance of the machine learning model is.
In this study, five success criteria were selected as the essential criteria in RBPs. Five models were developed, and for each model, one specific criterion was predicted in the output layer. Using the data collected from 121 completed RBPs, the models were generated, trained, and tested using MATLAB R2018b Machine Learning Toolbox. Finally, five scores denoting the level of satisfying criteria were achieved. The PSI of each project was derived using the determined values of the outputs.
For a new project in progress, the data, including the status of CSFs during the construction phase, can be used as the model's raw inputs. Consequently, the predicted PSI, which shows the overall success of the project, can be obtained. The weights of criteria in the PSI equation are subjective values that can be assumed by the stakeholders of the project being evaluated. The summary of the procedure of step 3 is depicted in Figure 5.

Step 1: Determining the Selected CSFs in RBPs (Input of ANN models)
Based on the literature review, 54 CSFs were extracted, which were then assessed and modified through the Delphi method and ranked by the Delphi panelists. Finally, 16 factors with scores greater than 3.5 were selected as the most critical success factors in RBPs and set as ANN models' input features. As listed in Table 1, these factors have been selected as the most significant or critical factors which are likely to influence the success of RBPs more than other factors. Based on the experts' opinions, the selected CSFs and their meaningful relationships to projects success can be discussed as follows: Factors 1 to 4 in Table 1 belong to project specifications. The project design includes details, specifications, sketches, plans, and sections of different building parts. The influencing parameters such as optimality and practicability of the design phase and elaborative drawings may lead to less expenditure in time and cost and also gain more quality in the final product. Machinery, equipment, and skilled human workforce are relevant to the technologies applied in the construction phase. In RBPs with high-tech procedures, state-of-the-art architecture, and smart systems, the cost may exceed the estimated value. The number of stories, total area, and the amount of activity vary due to the project's size. Large-size projects are more exposed to unpredictable delays. Finally, project complexity may cause more clashes in different disciplines, and the more complex projects need more requirements in activity sequence planning to prevent project suspension and reworks.
Factors 5 and 6 are organization-related CSFs. Top management support can be material such as financial supports, a competitive salary, fringe benefits, etc. or spiritual such as promotions, respectful behavior, and authority increase. A procedure is more likely to fulfill successfully if it is well-supported by the organization's top manager. Adequate and on-time resource allocation has been one of the most significant CSFs based on Delphi experts with a score of 4.5 out of 5.
Factors 7 to 12 are dealing with project team features. Recruiting qualified members in project team formation leads to a competent project team capable of solving problems arisen in the construction phase. Based on the experts' comments, judgment, leadership skills, and competency of the project manager have been recognized as the most critical factor with a score of 4.58 out of 5. The team's estimation of schedule and budget is set as a benchmark, and the less accurate the estimation, the more deviation from time and cost criteria is reported. The supervision level is directly proportional to the quality of housing units delivered at the end of the project. Nevertheless, it may have a reverse influence on the cost or time index. Contractor and subcontractor selection besides the procurement, which deals with goods and service provision for the project, are two main CSFs with the scores of 4.58 and 3.83, respectively. The selection of contractors who are more qualified technically and propose more competitive bids seems to improve the success index of RBPs in terms of cost, time, and quality.
Factors 13 to 16 are related to the external environment of RBP. These factors are not under the control of the project team or executive organization. Governmental policies or municipal rules and regulations may vary from strict rules or limiting instructions in some districts to lenient recommendations or supportive facilities in other districts. A rise in construction materials or wages may directly influence the total cost and cost deviation. As an economic Inputs 1. A short categorized list of CSFs in RBPs as neurons in input layer of ANN models 2. Selected SC in RBPs as neurons in output layer of ANN models 3. PSI as scoring index to assess RBPs success 4. Expert judgments Processes 1. Preparing questionnaires 2. Gathering data from completed projects using questionnaires 3. Finding the most optimized number of neurons in hidden layer 4. Developing the ANN models 5. Training, verifying and testing the ANN models.

Outputs
1. An ANN model to predict the success of RBPs

Step 3: Structuring, training and testing the proposed ANN model
indicator, the annual inflation rate affects the price of the project's final product in the market and the customer's affordability. Finally, the participation of end-user in the design phase may enhance the customer satisfaction index.
After selecting the CSFs, the related section in the questionnaire was designed. In this section, the experts in completed RBPs were asked to rate each CSF during the RBP using a score from 1, denoting extremely poor, to 5, denoting absolutely ideal.

Step 2: Determining the Selected Success Criteria in RBPs (Output of ANN Models)
In this step, 21 success criteria in construction projects were first extracted based on the literature review of the last three recent decades. These criteria were assessed and scored through the Delphi method according to the 5-point Likert scale in questionnaires distributed among 12 experts with more than 15 years of experience in RBPs. The results are brought in Table 2.
As shown in Table 2, five criteria, including time, cost, quality, safety, and stakeholders' satisfaction, have been indicated as the most significant criteria in RBPs with mean scores greater than 3.5 out of 5. Similar results have been reported in the literature. Therefore, these criteria were set as the neurons in the output layer of the ANN models.
The weight of these criteria may vary in different projects or organizations. To evaluate the success of RBPs based on these criteria, three main actions were taken: firstly, the level of meeting each of these criteria, ci, was determined after completion of the project. Secondly, the weight of each criterion was assigned based on decision-makers' opinions, and finally, the project success index, PSI, which is the weighted summation of values ci was determined (See Equation 1).
After selecting the success criteria, the related section in the questionnaire was adjusted, where the interviewees were asked to determine the level of accomplishment of targets for each of the five selected criteria using a score from 1, denoting not accomplished at all, to 5, denoting absolutely accomplished. Based on the results of the first two steps, the processes of identifying, categorizing, and ranking were implemented on the CSFs and success criteria. In comparison with other studies on CSFs, the following results were obtained: In the identification phase, Gunathilaka et al. [16] have used empirical and conceptual papers. However, in our study, in addition to extracting the CSFs from literature, experts' judgments have been collected using the Delphi method, which is a systematic method. Most of the CSFs identified in the research by Gudiene et al. [19] were also recognized in this study. Nevertheless, the ranking methods were different. In the AHP approach used by Gudiene et al. [19], the interaction among the CSFs or the criteria involved in project success has not been considered. Williams [23] considered the causal interaction among the CSFs, which seems a better approach to AHP, but he did not quantify the CSFs weights or rank them. Mavi and Standing [25] proposed the FANP approach, which has solved two above problems by considering the interdependencies among CSFs and quantifying the significance of CSFs. Similar to our study, identification and categorization of CSFs have been conducted in his study based on literature and expert judgments. However, since the main purpose of our study was to propose a framework to assess and predict the success of projects, specifically in RBPs, the ranking method to select the most important CSFs was different. Besides, the most important criteria involved in the PSI have also been extracted in our study. The current study was in line with the study of Mukhtar et al. [40] on extracting the CSFs from the literature in housing projects by their categorizing and ranking based on distributing questionnaires among experts. However, they have used structural equation modeling (SEM) to consider the interdependencies among CSFs, while in our study, an index for project success is proposed based on the main criteria. At the same time, the assessment of project success is conducted in an ANN model which is included the main CSFs and success criteria and the interdependencies based on historical data sets obtained from real completed RBPs.
Similar to Olawumi and Chan [17], the Delphi method was applied to assess the CSFs in construction projects. Since our study's focus was on RBPs, the CSFs identified in this study were almost different from [17], which was focused on the success of projects concerning the stainability and applicability of BIM in construction projects. Silva et al. [18] ranked the CSFs based on the frequency of occurrence in literature, but in this study, the ranking of CSFs was based on expert judgments using the Delphi method. Also, this study was specifically targeted the RBPs. Ghanbaripour et al. [22] used the statistical approach and questionnaire distribution to rank the CSFs. Although most of the factors have been similarly identified, it seems that categorizing the factors or assessing the amount of influence on each success criterion was not investigated in their survey.
The findings of this study are in line with Rashid and Sudong [20]. They introduced a relationship for determining project success, and the CSFs were classified into five categories. However, in their study, the CSFs were involved in project success relationship as independent variables. In our study, the project success has been considered as a multicriteria term where CSFs affect this term with respect to each criterion.

Step 3: Developing the Proposed ANN Models as a Decision Support System
Having known 16 CSFs as input features of models and five main success criteria as the output of models, questionnaires were set up to collect data from the completed projects. In this study, 121 completed residential building projects distributed in 22 districts of Tehran were assessed. To find the optimum network architecture for predicting the outputs, the network was trained by training set selected randomly from the dataset. Validation data were used to validate the quality of the proposed ANN model, while the stop criteria and weight reset were used to cope with under-fitting/over-fitting problems. Figure 6 depicts the MSE values of the ANN in predicting time, cost, quality, safety, and stakeholders' satisfaction, having the values of CSFs. According to the figure, all the MSE values for train, validation, and test stages are obtained in the range of 0-0.03, which indicates the capability of the proposed ANN architecture to predict network outputs. One of the essential parameters of ANNs, which significantly affects this model's performance, is the network architecture, i.e., the size of the network hidden layer. The size of the hidden layer varied from 5 to 20 to obtain the optimum network. Figure 6 indicates that there is no special relationship between the accuracy and the number of nodes in the hidden layer of the ANN method. It was predictable: by reducing the number of nodes in the hidden layer, the weight, and bias of nodes' sigmoid functions, remarkably vary to reduce the training error. Therefore, the performance of the network reduces significantly in predicting test data. On the other hand, an increase in the number of neurons in the hidden layer results in training sigmoid functions of the learner nodes with a few samples. In these conditions, the accurate prediction of test targets, particularly for out-of-range samples, will be accompanied by a significant error [59]. The optimum number of hidden neurons in the network for predicting time, cost, quality, safety, and stakeholders' satisfaction based on the studied CSFs was 16,16,16,15, and 15, respectively. Therefore, the network architectures were 16-16-1 for predicting time, cost, quality, and 16-15-1 for predicting safety and stakeholders' satisfaction.
Machine learning is required for tasks that are too complex for humans to implement directly [60]. The model used in this study had 16 input variables, which made it impossible to provide a statistical model. Therefore, some tasks are so complex that it is impractical, if not impossible, for humans to work out all of the nuances and code for them explicitly [61]. Instead, we provide a large amount of data to a machine learning algorithm and let the algorithm work it out by exploring that data and searching for a model. For example, in this study, the reported values of time, cost, quality, safety, and stakeholders' satisfaction in a project were 3, 2, 4.4, 4.25, and 3, respectively. The ANN's predicted values for this project were equal to 3.1, 2.15, 4.22, 4.37, and 2.95, respectively.
A comparison of observed and predicted data and the error values are shown in Figures 7 and 8, respectively, to indicate the closeness between the predicted and the measured value. The dashed line in each axis of Figure 7 represents the perfect result-predicted values = observed values. The solid line represents the best fit linear regression line between outputs and inputs. In general, the figure indicates a good fit between the outputs and the inputs. As shown in Figure 8, a normal function can be fitted to the distribution of prediction errors. The location of zero error in the diagrams reveals that the ANN model used in this study has not been trapped in local minima. Therefore, similar to many construction management projects that have successfully used ANN in modeling purposes, ANNs can be used to predict the opinion of the experts in the field. ANNs are a specific set of algorithms that have revolutionized the field of machine learning. They are inspired by biological neural networks, and the current so-called feed-forward ANNs have proven to work quite very well. ANNs are themselves general function approximations. That is why they can be applied to almost all machine learning problems where the problem is about learning a complex mapping from the input to the output space.

Conclusions
The findings of this study are useful to predict the success of RBPs. Artificial intelligence capabilities in data analysis have been combined with expert judgment to determine and select a model's inputs and outputs as a decision support system in RBPs. As RBPs play an important socioeconomic role in developing countries, the findings of this study can be applicable for policymakers in the housing sector. The purpose of this study has been to achieve the following goals:  Expanding the previous studies on critical success factors (CSFs) and success criteria in residential building projects (RBPs), particularly in developing countries. The wide range of exploratory research has been carried out on success factors and criteria. However, the studies on the success of residential building projects area are rare in literature. Most of the previous studies have not been involved in predicting or estimating the success of the projects.
 Extracting and developing a list of influential CSFs and criteria in RBPs based on previous studies and using a systematic method in collecting experts' judgments. These lists can be valuable and considerable for project managers in RBPs. This study provides the RBPs managers with a framework in which he/she can monitor the status of CSFs and their consequences.
 Proposing a framework to evaluate the overall success of RBPs in advance by an overall index, including the selected criteria. This study has proposed an index to quantify the success of the project based on the selected criteria. Since the term of success is subjective and its value varies due to the decision-makers' opinion in an individual project, the PSI includes variable coefficients as criteria weights assigned by the project team.
 Implementing the approach of artificial intelligence instead of statistical regression by developing ANN models to predict the project outputs with respect to selected criteria, knowing the status of selected CSFs during the construction phase. In the literature, the studies on predicting project success using ANN are very scarce. It seems that this study, which deals specifically with RBPs, has made a substantial contribution to this issue.
 Providing a DSS for policymakers or decision-makers in RBPs to be capable of timely corrective actions. The framework presented in this study can estimate the project's outcome with respect to each criterion in advance. Thus, top managers and project managers can have a perspective of project outcomes based on the status of CSFs in the construction phase or the assigned weights for criteria. If the outcomes are not favorable, the corrective actions can be decided.
This study has some limitations. It has been conducted under the particular circumstances of the case study in the developing country of Iran and specifically in the most urbanized city of Tehran. The ANN models may be applied in RBPs with similar conditions. Nevertheless, the findings, including the presented framework, the method of collecting and synthesizing experts' judgments, the list of influential CSFs, the selected criteria, and the introduced PSI, can be applied in RBPs located in other regions.
For future investigations, implementing other types of ANNs or machine learning methods such as SVM is suggested. Additionally, other methods of comparing and ranking CSFs or success criteria such as FANP may lead to more comprehensive predictive models. Furthermore, more complicated relationships can be considered for PSI with higher-order powers or nonlinear equations as a more accurate index.

Conflicts of Interest
The authors declare no conflict of interest.