Prediction of Sediment Accumulation Model for Trunk Sewer Using Multiple Linear Regression and Neural Network Techniques

Sewer sediment deposition is an important aspect as it relates to several operational and environmental problems. It concerns municipalities as it affects the sewer system and contributes to sewer failure which has a catastrophic effect if happened in trunks or interceptors. Sewer rehabilitation is a costly process and complex in terms of choosing the method of rehabilitation and individual sewers to be rehabilitated. For such a complex process, inspection techniques assist in the decision-making process; though, it may add to the total expenditure of the project as it requires special tools and trained personnel. For developing countries, Inspection could prohibit the rehabilitation proceeds. In this study, the researchers proposed an alternative method for sewer sediment accumulation calculation using predictive models harnessing multiple linear regression model (MLRM) and artificial neural network (ANN). AL-Thawra trunk sewer in Baghdad city is selected as a case study area; data from a survey done on this trunk is used in the modeling process. Results showed that MLRM is acceptable, with an adjusted coefficient of determination (adj. R) in order of 89.55%. ANN model found to be practical with R of 82.3% and fit the data better throughout its range. Sensitivity analysis showed that the flow is the most influential parameter on the depth of sediment deposition.


Introduction
In-sewer sediment is getting more scientific and operational interest in the last decade [1]; Sediments disposition in combined sewers is a critical aspect as it is a primary cause of several hydraulic and environmental problems such as blockage, reduction in hydraulic capacity, increase in the flooding frequency, sewer wall corrosion, shock loads to wastewater treatment plants and, erosion and resuspension of the deposited solids during wet weather flow (WWF) [2].The reduction in hydraulic capacity in sewers containing sediment is significant reaching 10-20% for relatively small sediment/diameter values of 2-10% [3].Hannouche et al. [4] assert that the contribution of the sewer sediment resuspension to the total TSS of the wastewater is in order of 20-80%.Depending on the their characteristics , Crabtree [5] classified sewer sediment into five classes: Class-A, is coarser, loose, granular, mainly inorganic material found in the inverts of pipes; Class-B, Same as Class-A but the grains are found mixed with cementation agent; Class-C, mobile, smaller grain size deposits found in low-velocity regions overlaying Class-A solids; Class-D, highly organic biofilms found in sewer walls in the vicinity of the mean flow level; Class-E, small grained size sediment found in tanks.
Maintenance of sewerage systems is a labor and resource intensive process; therefore, many operators prefer reactive more than proactive maintenance [6].For particular sewer system, information about structural, operational, hydraulic and environmental situations are required to improve the operation and maintenance (O&M) of the system [7].Inspection is one of the used techniques to acquire information about the sewer system; while closed-circuit television (CCTV) is the most common method; unfortunately, it has proven to be inaccurate and subjective, the false-negative error is in order of 25% (i.e. the defect is present but not reported by the inspector one-fourth of images) [8].Instead, multi-sensor inspection (MSI) is one of the leading technologies for inspecting, in particular, man entry sewers.MSI measures for critical problems such as corrosion and sediment levels that are missed by CCTV, giving reliable data for better decisions and economical rehabilitation [9].
Methods for predicting the in-sewer location and quantity of sediment deposition are still embryonic.Present semitheoretical models for sediment accumulation might be helpful in limited cases (i.e. for sewers operating in similar conditions) [10].Sewer sediment field studies are still rare, for example, a research done by [11,12] in Dundee collection system in the United Kingdom based on the field data and utilizing multi-linear regression approach supposed that dimensionless variables (including hydraulics, properties of the deposits, transported particle characteristics, etc.) are significant in the transport/erosion phenomena.Recently, Ebtehaj et al. [13] used artificial neural networks (ANN) to predict sediment transportation in clean pipes, ANN models are shown to be superior to regression models.Despite the high uncertainties, field sediment deposition studies widen the knowledge about such a complex underlying process.However, the equations obtained from these studies are local and time-dependent [14].
This study aims to assess the current situation of the case study (AL-Thawra trunk sewer) regarding the sediment accumulation; afterward, calibration and validation of multiple linear regression model and ANN model is to be conducted.This study will be beneficial for decision-makers in municipalities regarding the O&M of trunks undergone similar conditions.

Study Case Description
Baghdad lies on the banks of the river Tigris, which separate the city into an eastern and western part called Rusafa and Karkh, respectively.It is estimated that 5.4 million inhabitants in Baghdad that are served by combined sewer system while 0.25 million using private septic tanks.Thirty percent (30%) of the sewers are flooded during WWF periods.The system is old and deteriorated (40% of the sewers need instant repair).Likewise, the treatment plant is experiencing deficiency and (20%) of the sewage is disposed of without treatment [15].

Figure 1. Layout for TH-trunk sewer
Rusafa district in Baghdad city has been suffering for many years of frequent floods after relatively small rainfall events.Al-Thawra trunk (TH-trunk) sewer in the Rusafa district is selected as a study area as shown in Figure 1.The TH-trunk is constructed in 1983 as combined type; receiving the residential sewage and rainwater; flowing by gravity with a total length of 10.5 km.starting with 1800 mm at Al-Shaab district then expand to 2400 mm, and enters Al-Saddar district with a 3,000 mm diameter.Serving a densely populated area (approximately 1.43 million inhabitants), TH-trunk ends in the Habibya pumping station [16]

Data Collection and Analysis
In 2014, Baghdad Mayoralty (BM) conducted an inspection survey for the TH-trunk.An MSI-Boat is employed; this boat comprises of three measuring devices: a CCTV to capture the above-water-level condition of the sewer: a laser profiler, to quantify the corrosion in the above-water-level part of the pipes; and finally, a sonar mounted in the bottom of the boat to measure the sediment that deposited in the sewers The data derived from this inspection is obtained and implemented throughout this research.

Sediment Accumulation Modeling
Modeling is the act of building a simplified representation for a system (or phenomena) in concern, it is one of the most important aspects of each sewage network as it supports in design, O&M, and rehabilitation of different parts of the sewerage system.The Trunk sewer under study is deteriorated, poorly maintained, and undergone multiple shutdowns which resulted in a serious sedimentation problem.These unusual conditions accompanied with lack of enough information resulted in the fact that theoretical and semi-empirical techniques used for prediction if sewer sediments cannot work in such a sewer.
The methodology implemented in order to propose validated regression and ANN models are illustrated in Figure 2. and Figure 3.  Note: sediment depth (ds) is the dependent variable; all other variables are independent and used as predictors.

Multiple linear regression
Regression analysis is one of the types of statistical techniques that is widely used to develop empirical equations.Multiple linear regression model (MLRM) is a generalized version of the simple linear regression model; MLRM fits more than one independent variables (Xs) with the single dependent variable (Y).The general model for n observations (datasets) is given by Konishi [17]: Where: β0 is the intercept point, while β1, β2, βp are the coefficients of the independent variables X1, X2, Xp, respectively.Lastly, the error term is denoted by ε.The above set of equations have defined in matrix notation as: Where: vector Y is n-dimensional for observed values, X is n × (p + 1) matrix including the data obtained for the p independent variables, β is the (p + 1)-dimensional vector of regression coefficients, and ε is the n-dimensional error vector.The model can be calibrated and then used to get prediction values ( ̂) of the independent variable (Y).The estimation of regression coefficient vector β by the least squares is done by: The stepwise procedure produces more than one candidate model by adding one significant predictor at a time.It is proven that as the number of predictors increases, the model will always fit better and the coefficient of determination (R 2 ) will increase boosting the risk of fitting the noise (over fitting problems); thus, the model fails to fit for future prediction.Akaike and Bayes information criteria (AIC) and (BIC) are estimators of the relative quality of the model; AIC and BIC penalize the number of the parameters used in the model.Therefore, AIC and BIC; given in Eq. ( 4) and Eq. ( 5) can be considered as a benchmark to select the best model among several candidate models [18,19].
Where the residual sum of squares( , n is the number of observations and k is the number of parameters (including the intercept) +1.It is worth mentioning that the best model is the one that has smaller K and RSS values; therefore, the model with better fit has lower AIC and BIC values.Afterward, the selected model (the one having lowest AIC and BIC values) is to be validated [20] ; in this study , the root mean square error (RMSEval) is used as a criterion.RMSEval is given in Eq. ( 6), where nval is the number of points in the validation dataset.

Artificial Neural Networks
Artificial neural network (ANN) is a promising modeling technique mimicking the learning process of the human nervous system.It is extensively used to conduct regression, classification, pattern recognition, etc.Although it shows a superiority in prediction, its usefulness is limited as it gives a scarce indications about the features of the underlying process that relates the inputs to the output [21].ANN is a very flexible and powerful tool, usually used when conventional statistical and mathematical methods fail due to the unfulfilled boundary conditions.ANNs are not theoretically supported, nevertheless, still showing practical significance [22].ANN comprises a number of layers each layer consists of nodes arranged in one level, each node (neurons) has a simple task defined by an activation function.The neural network architecture characterized by showing the way to connect the nodes, the total number of layers and the number of neurons per layer.Multilayer perceptron (MLP) shown in Figure 4. is one of the most commonly used ANN-architectures; apply a feed forward architecture (information moves forward; from the input nodes to the output nodes).MLP uses a learning algorithm called back propagation [23].The pre-processing of the data is necessary to improve the training of the network, various processing methods in practice such as standardization, normalization, etc.Also, one of the important aspects is the dataset divisions; the dataset usually divided into three samples.Firstly, the training sample used to optimize the network weights (supervised learning process).Secondly, the testing sample used to check the error during the learning process.Finally, the holdout sample used for validation; after the training of the network done, the holdout sample assesses the predictive accuracy of the network [24].

Sediment Accumulation Assessment
As mentioned earlier, BM had carried out a sediment deposition survey.The average sediment depths for the diameters of 1800, 2400 and 3000 mm are estimated to be 380, 1220 and 1052 mm, respectively.Figure .5It shows the average sediment depth in each individual sewer.It is obvious that the sediment depths are very high, for instance, between stations 4000-6000, the sediment depth is covering more than 50% of the sewer cross-sectional area.Several operational factors are involved to cause this problem: firstly, shutdowns of pumping stations, which may last for several weeks in times of wars, these discontinuity initiate stagnation, build-up, consolidation of sediment; secondly, poor maintenance (e.g., no periodic jetting).Human-behavioral factors also contribute to this problem; e.g.littering of garbage and dumping construction/demolition solid waste in the streets; these materials find their way into the sewer system.
A related point to consider is that, after each change in diameter, there is a jump in the sediment level.This can be attributed to the fact that the change in diameter is sudden and with no equivalent increase in the flow rate leading to a reduced sewer flow velocity.

Multiple Linear Regression Model (MLRM)
The data set available for 63 sewers, 6 sewer data are set aside as a validation sample.The other 57 sewer dataset sample is used in model calibration, SPSS regression tool is used adopting the stepwise procedure in the variable selection progression; this resulted in seven candidate models, summarized in Table 2. Model #7 presents the maximum coefficient of determination (R 2 ) of 89.5% (i.e., approximately 90 percent of the variance of the response variable (ds) explained by the model).Nevertheless, Coefficient of determination is not a criterion to select between candidate models.Instead, a model selection approach is adopted utilizing AIC/BIC criteria as shown in Table 3. Model #7 is selected as it showed minimal AIC and BIC values.The ANOVA test shown in Table 4 is used to compare the goodness of fit of model #7 and intercept only-model (i.e., mean value of the response variable).Null hypothesis H0: The fit of the intercept-only model and the tested model are equal.Alternative hypothesis Ha: The fit of the intercept-only model is significantly reduced compared to the tested model.The results showed that as the p-value is less than 0.01.Thus, the null hypothesis rejected at a 99% confidence level.
The t-test is performed to check the significance of the predictors, results in Table (5) showed that all p-values less than 0.05 which is satisfactory to imply that all the predictors are significant.The model indicates that the sediment level in the sewer is adversely related to the flow-rate; as the flow rise, the velocity in the sewer will increase, scouring more of the sediment and thus reducing its depth.Moreover, sediment level has a direct relationship with the number of connections to the upstream manhole; these connections convey the sediment from high-velocity mains to a lower velocity trunk; solids tend to settle as near as possible, resulting in higher sediment levels for sewers having a connection in the upstream manholes.Also, the model shows a negative relationship with distance from the change in direction; which cause higher turbulence in the downstream sewers.Consequently increasing the erosion of the sediment.The cumulative length had a direct correlation with sediment as the sediment moves towards downstream direction.Finally, pipe's length is showing a positive slope with sediment depth; probably because that the taller the sewer is, the more susceptible to stagnation and blockage.The predicted vs. observed plot is used to assess the predictive accuracy of the model.As shown Figure 6., the points distributed on both sides of the diagonal line (which represent zero-error line) in almost equal distribution; which implies that the selected model is performing good and it is not biased.The validation points are shown in the predicted vs. observed plot; giving distribution over the diagonal line with an estimated RMSE of (14.14).However, the model tends to overestimate the value of the depth of sediment when its actual value below 50 cm, underestimate when its actual value between 50-100 cm.Fortunately, the model is robust at the upper reach of the depth of sediment.

Artificial Neural Networks
The MLP-NN sediment model is generated using SPSS-neural network tool.The optimal network is attained by trial and error; in this model, 69.8 % of the total 63 datasets used for training, 19 % for testing and an 11% percent are stored as a holdout sample.These data have been standardized.The use of automatic architecture selection has resulted in one neuron in a single hidden layer.Hornik et.al [25] assert that single hidden layer neural networks are universal approximator.The optimization algorithm that is used to estimate the network weights is gradient descent and the criteria for training is online training.

Bias
The relative error is used to check the model accuracy by comparing to an intercept-only model (i.e., ∑Squares Error of NN-model/∑Squares Error of intercept-only model).Model summary in Table 6.gives a brief idea about the model accuracy, showing that the relative error of the validation sample is 0.087.Accordingly, for future prediction of sewer sediment levels, the NN model will show robust predictions with an 8.7% of the error of the intercept-only model.The predicted vs. observed plot gave a good indication to the welfare of the model as shown in Figure 8.The coefficient of determination (R 2 ) is shown to be 82.3% which indicate a good predictive capacity for this NN model.

Figure 8. Predicted-Observed comparison plot for NN-model
The NN connection weights can be translated to measure the sensitivity of the independent variables and their significance on the NN output [26].The results indicated that the peak flow and peak velocity have a considerable effect on the predicted output with a relative importance of 15.4% and 14.8%, respectively.Figure 9. Shows the relative importance of each independent variable.qpeak is shown to be of most importance as it is directly affects other independent variables (e.g.Vpeak, qpeak/Q).

Conclusions
Sediment accumulation in a sewer relates to several operational and environmental problems.Sewer inspection is needed to ensure the reliability of the system.However, it is an expensive process as it requires specialized measuring devices and trained personnel.Alternatively, a cost-effective method is by using previous inspection data to generate a sediment accumulation/erosion predictive model.The main conclusions attained regarding this research are:  MSI Survey for TH-Trunk had been carried out; sediment depth is found to be quite high which may be attributed to several operational and individual's malpractices.In this paper, for trunks undergone similar conditions such that of TH trunk and with using MLRM and MLP-ANN methods; two predictive models relating multiple independent variables with ds are obtained.
 Multiple linear regression modeling approach produced a model with five significant predictors (qpeak 3 , NOC, L, Ldir., Lcum.).The model has a good predictive accuracy is obtained (adjusted R 2 of 88.6%) and validated using data splitting approach.
 MLP-ANN model with good predictive accuracy achieved with better fit of the data scheme as the overestimation/underestimation is lower than the MLRM.
 ANN model is defined in relatively simple mathematical notation which is preferred in practice.However, the interpretation of ANN models is quite difficult.
 The sensitivity analysis of MLP-ANN parameters indicated that peak flow has the most significant effect on the prediction of ds.

Figure 4
Figure 4.The multilayer perceptron architecture

Figure 9 .
Figure 9.The relative importance of the independent variables

Table 5 .
shows coefficient results for the selected model; the model can be written as below: d s = 0.036 L cum. + 0.203 L + 6.685 NOC − 1.142 Z − 0.022 L dir. − 48.493