Short-, Medium-, and Long-Term Prediction of Carbon Dioxide Emissions using Wavelet-Enhanced Extreme Learning Machine

Carbon dioxide (CO 2 ) is the main greenhouse gas responsible for global warming. Early prediction of CO 2 is critical for developing strategies to mitigate the effects of climate change. A sophisticated version of the extreme learning machine (ELM), the wavelet enhanced extreme learning machine (W-EELM), is used to predict CO 2 on different time scales (weekly, monthly, and yearly). Data were collected from the Mauna Loa Observatory station in Hawaii, which is ideal for global air sampling. Instead of the traditional method (singular value decomposition), a complete orthogonal decomposition (COD) was used to accurately calculate the weights of the ELM output layers. Another contribution of this study is the removal of noise from the input signal using the wavelet transform technique. The results of the W-EELM model are compared with the results of the classical ELM. Various statistical metrics are used to evaluate the models, and the comparative figures confirm the superiority of the applied models over the ELM model. The proposed W-EELM model proves to be a robust and applicable computer-based technology for modeling CO 2 concentrations, which contributes to the fundamental knowledge of the environmental engineering


Introduction 1.Background of the Study
Due to the increasing global warming in various spheres of life, the threat of climate change has been considered a serious environmental problem for the last twenty years.The intensive expansion of human and industrial activities in many countries has put enormous pressure on the environment by releasing large amounts of greenhouse gases into the atmosphere.The inefficient, systematic process of releasing gasses into the atmosphere eventually leads to the occurrence of 'climate extremism' and serious consequences that threaten human health, the economy, and human development and increase the concentration of pollutants in the atmosphere.For example, the global average surface temperature of the Earth has increased from 0.4 to 0.8°C since the end of the 19th century [1].Carbon dioxide is the main cause of global warming on our planet [2][3][4][5] and thus plays an important role in maintaining the stability of the climate system [6,7].Recently, the phenomenon of global warming has become the biggest environmental problem in human history [8,9].Although measures have been taken to reduce carbon emissions into the atmosphere, the concentration continues to increase.In this context, international organizations are making great efforts to reduce the negative effects of global warming by focusing on policies to reduce carbon dioxide emissions [10].
Based on the 1997 climate report, the Kyoto Protocol called for an 85% reduction in greenhouse gas emissions.In 2009, the United Nations Framework Convention on Climate Change (UNFCCC) took place.As a result of this convention, China promised to reduce CO2 emissions by 40-45% by the end of 2020 [10].Although CO2 concentration set a record in May 2019, it reached 415 ppm at Mauna Loa Observatory Station.However, the International Energy Agency (IEA) reports that global CO2 emissions from energy generation have decreased to 33 megatons, which is due to the reduction of CO2 emissions from power plants in developed countries.The main reason for this decrease in CO2 emissions in the energy sector is the expansion of the use of renewable energy and the reduction in the use of fossil fuels in power plants, as well as the switch to natural gas.It is worth noting that the global CO2 concentration from the energy sector accounts for 41% of all CO2 emissions, while the other CO2 emission sources can be divided as follows: 20% from industry, 16% from road transport, 6% from other transport, and about 16% from various sectors and households [11].Figure 1 shows the global CO2 emissions of developed and developing countries based on IEA data [12].

Figure 1. Records of CO2 emissions over the last 30 years
Consequently, the main goal to preserve our planet is to keep the temperature increase on Earth below 2°C according to the recommendations of the Paris Agreement [13].According to the recommendations of the Paris Agreement, developed countries must reduce CO2 emissions from the energy industry.It is important to note that some of these countries have responded very well to the demands.For example, CO2 emissions from the energy sector have decreased by 2.9% in the United States, 8% in Germany, 2% in the United Kingdom, and 4.3% in Japan.On the other hand, energyrelated CO2 emissions have increased in Asia due to the increased demand for coal for energy facilities [12].
Recently, the concentration of CO2 emissions has increased dramatically, which puts enormous pressure on the entire ecosystem.CO2 gas concentration has already exceeded the normal and safe level of about 350 ppm.Catastrophic weather events such as sea level rise and hurricanes can be avoided if the CO2 concentration in the atmosphere is at 350 ppm.Moreover, the irreversible and dangerous consequences of climate change would be limited if the CO2 level is at the safe level (350 ppm).It should be noted that the highest value determined in this study is 415.39 ppm.

Related Works and Research Gap
The prediction of CO2 is an important issue in the environmental field because it significantly affects the temperature of the Earth's surface.Moreover, CO2 concentration has increased exponentially in recent years and poses a significant threat to human life and the ecosystem.Accurate prediction of CO2 concentration is not only important to provide important information to policy makers, but also can improve the quality of CO2 emission management [14].CO2 prediction is crucial to monitor the changes of this gas over time and establish a reliable warning system; however, few studies have been conducted to predict this gas.Some studies have used artificial intelligence (AI) techniques to predict CO2 emissions, including genetic algorithm [15,16], artificial neural network (ANN) [17], Gaussian processes regression method [18], and logarithmic mean Divisia index method [19,20].Mardani et al. [21] conducted a study using ANN and an adaptive neuro-fuzzy inference system (ANFIS) for CO2 prediction.These models were validated against multilinear regression.The study found that the AI models had excellent prediction accuracy compared to the MLR model.In addition, Saleh et al. [22] used support vector regression to predict CO2 emissions from energy consumption.As the hyperparameters of the applied model were calculated by trial-and-error method, the result of the study was satisfactory.
However, the standard ANN model has some problems, such as long training time and low generalization [22], while other models, such as Gaussian process regression, require accurate tuning of the hyperparameters of this model and appropriate selection of the kernel function.It is important to mention that the above studies could not consider the analysis of time series of CO2.Moreover, the mentioned researchers obtained the required data from specific stations.In other words, these models can provide information on emissions at a regional scale, but not on global emissions.Detection of gas emissions at a geographically and climatically delineated location can provide an important indication of the increase in carbon dioxide concentration around the globe, which has not been studied before.

Research Significance
Accurate prediction of CO2 gas is very important for establishing an advanced early warning system and can play a crucial role in evaluating the adopted global strategies to reduce the concentration of this gas in the air.The main challenge is that the atmospheric CO2 concentration increases very rapidly in short time periods.In this study, the ability of a novel model called wavelet enhanced extreme learning machine (W-EELM) to predict CO2 emissions is investigated.In a recent study [23], the classical ELM was improved by using a robust algorithm called the complete orthogonal decomposition algorithm (COD) instead of a classical algorithm (Singular Value Decomposition (SVD)) to optimally compute the output weights of ELM.Several studies have shown that the COD algorithm is much better than SVD through the calibration process.
Moreover, COD is more reliable, efficient and faster than SVD in solving technical problems [23,24].We integrate an enhanced extreme learning machine (EELM) with a discrete wavelet transform approach.The discrete wavelet transform technique removes the noise from the data set to obtain more accurate predictions.The proposed (W-EELM) was validated in predicting CO2 emissions at different time scales (i.e., weekly, monthly, and yearly) against the standard ELM.

Environmental Data and Site Description
This study uses CO2 emission data collected from the Mauna Loa Observatory Station in the state of Hawaii, United States.This station measures environmental factors and emission gasses that contribute to global climate change.The observatory's location is ideal for collecting air samples because it is located in Hawaii on the side of the largest active volcano on Earth.Geographically, the station is about 3400 m above mean sea level, far enough away from pollution sources to facilitate scientists and researchers studying and analyzing air properties.The other geographic and hydrologic features of the observatory are listed in Table 1.In the 1950s, scientists studied atmospheric safety at Mauna Loa Observatory.This monitoring station can detect global climate changes by measuring various air gas concentrations.Many gas emissions are measured, including carbon dioxide, methane, sulfur dioxide, and nitrous oxide.Among these greenhouse gas emissions, CO2 is the most critical pollutant that mainly causes global warming.The CO2 concentration data used in this study were collected by the National Oceanic and Atmospheric Administration of the U.S. Department of Commerce [26].In addition, the data span multiple time scales (i.e., weekly, monthly, and yearly).The statistical characteristics of the data set with different time scales are shown in Table 2.The data set contains some missing values, which we fill by linear interpolation.

Extreme Learning Machine
Extreme learning machine (ELM) can be defined as a learning algorithm presented by Huang for training single hidden layer feedforward neural network (SLFN) [27].Training classical SLFN can be performed using a backpropagation algorithm, which has several shortcomings, including time consumption, computational cost, and overfitting problems.Therefore, the ELM algorithm is proposed for training SLFN to achieve faster and more accurate modeling with better generalization [28][29][30].The hidden layer of SLFN is the most important element in the structure of SLFN and significantly affects the efficiency of the model.According to Huang [27], if the transfer function of the hidden layer is infinitely differentiable in each interval and there are enough hidden nodes, it is not necessary to adjust all the weighting values of the network.In accordance with this fact, the ELM algorithm initializes the weight and bias values of the hidden layer randomly.For this reason, the ELM model requires less learning time than the classical neural network system.It is important to mention that the weights of the output layer can be calculated using the least squares method based on the Moore-Penrose generalized inverse function.Figure 2 shows the general structure of the modeling approach of ELM.

Figure 2. General structure of ELM modeling approach
The mathematical expression of the ELM model can be expressed by Equation 1: where L is the number of hidden nodes,   (    +   ) is the output function of the hidden layer,   and   are the parameters (i.e., weights and biases) of the hidden nodes determined randomly,   refers to the weight values that map the  ℎ hidden nodes to the output node, and   is the output target of the ELM model.To achieve good generalization and more stable modeling, the required number of hidden nodes in the hidden layer should not exceed the number of input samples.
In the modeling approach of ELM, the parameters of the hidden nodes are determined randomly without iteratively tuning and adjusting their values, as is the case in various types of artificial intelligent modeling approaches [31].The main matrix of ELM is the randomly generated values of the weights of the hidden layer.This can lead to zero error and provides an opportunity to analytically configure the weight values of the output neural network layer (B) for the training samples.Moreover, the parameter values of the internal activation function (  ,   ) are assigned according to the probability distribution.Finally, the general matrix describing the ELM model can be expressed in Equations 2 and 3 [32,33]: where H is the matrix of the hidden layer and T is the matrix transpose.The matrix of ELM can be simplified as Equation 4: The least square method of Equation 3 can be expressed in Equation 5: The  † is the Moore-Penrose inverse of Hussain (H) matrix.Conventionally, the singular value decomposition (SVD) method has been used primarily for ELM learning.

Enhanced Extreme Learning Machine (EELM) Model
The SVD approach is often used to solve linear problems using the least squares error method.However, the complete orthogonal decomposition (COD) algorithm can be used as an alternative to the traditional SVD method because it gives excellent results in solving linear problems with a simple and reliable computation [34].In the present study, the COD algorithm is used to calculate the output weight values in order to improve the learning method and the efficiency of the model ELM.It should be noted that the COD method has special features that distinguish it from the usual SVD method.One of these advantages is that the COD algorithm can provide much more accurate results with a simpler computational process.
It is important to point out that the COD approach generally provides very efficient computational results regardless of the size of the H matrix.Moreover, the results of this algorithm are stable, and the required weights can be calculated using the least squares norm and in a shorter time.More details about the COD algorithm and its applications in tuning the ELM modeling technique can be found in Guo et al. [35].

Wavelet Transform
Wavelet transform (WT) is a mathematical approach used to remove noise and decompose data series.In WT analysis, a time series process includes two main components, low-frequency, and high-frequency components.The general characteristics of time series data include seasonal and cyclical trends, which are processed by the low-frequency components, while the chaotic and detailed elements are preserved in the high-frequency components.This strategy is similar to the variable separation approach in time series analysis, which can be used to identify inherent patterns in raw time series data.
The mathematical expression of the temporal wavelet transforms of a continuous time series, x(t) can be defined as shown in Equation 6 [36,37]: Here,  describes the time shift of the function that supports a careful study of the signal;  represents the dilation, the symbol * describes the complexity formula used for the conjugation, and () represents the mother wavelet.This transformation approach is primarily concerned with determining the time scale of the process.In practice, ecologists usually do not prefer the continuous wavelet transform because it generates redundant information that can affect the effectiveness of simulation models.Therefore, the discrete wavelet transform is often used instead of the continuoustime signal process.The formula of the discrete mother function can be expressed in Equation 7 [36]: n this equation,  and  are integers.Wavelet dilation and wavelet translation are controlled by  and  , respectively.Normally, the parameters   and    are equal to 1 and 2, respectively.  describes the location parameter and its value should be greater than zero, while    indicates the step of refined dilation, which should be greater than one.This logarithmic scaling for dilation and translation is called dyadic grid management.The dyadic wavelet function is presented in Equation 8 [36]: For a discrete-time series,   , the dyadic wavelet transform is represented in Equation 9 [38,39]: Thus,  , is the wavelet coefficient for the discrete wavelet with scale  = 2  and location  = 2  (where  = 0, 1, 2, …,  − 1).In addition, the smoothed signal component, which represents the overall trend of the time series, is considered as .Thus, the discrete inverse transform function reconstructs the signal   as shown in Equation 10 [38,39]: where () is the approximation sub-signal at level , while  , is the details sub-signal at the level  = 1, 2, … , A and the time dimension of  ( = 1, 2, … , ).

Modeling Development
In this study, the ability of the wavelet enhanced extreme learning machine (W-EELM) model with a single hidden layer to predict CO2 emissions is investigated.The experiment of the modeling approach was conducted using MATLAB 2018b software environment.For this purpose, the experimental data set is divided into two stages; the first stage is used for training and constructing the proposed technique, while the second stage is used for testing.The assumed input combinations used to develop predictive models are expressed in Equations 11, 12, and 13: The input layer of W-EELM contains wavelet neurons (nodes) fed with subseries of CO2 time series obtained by discrete wavelet transform (Equation 9).
where the symbol * refers to the time scale of CO2 concentration (i.e., weekly, monthly, and yearly).
Determining the hidden layer nodes in the middle layer of a neural network is critical to the development of the proposed model and has significant implications for model accuracy.For this reason, various combinations of hidden nodes ranging from 1 to 25 are used.Figure 3 clearly shows the methodology used to develop the proposed and comparable model to achieve the objective of this study.

Figure 3. The methodology of the W-EELM models
The type of wavelet mother and the level of decomposition are crucial to increase the efficiency of the predictive model.In this study, the Daubechies wavelet was selected to decompose the raw time series data because it can extract useful information features from the data.Moreover, it has been widely used to deal with problems related to carbon dioxide emissions [40][41][42][43].Moreover, the minimum decomposition level [44,45] was calculated using log (N).Here, N denotes the number of original data used for the analysis of the five types of disappearance used in this study.The main steps of conduct the study can be summarized as follows:  Preparation of the CO2 time series data (weekly scale) and subsequent calculation of the monthly and yearly time scales.
 Select the possible input vectors for each time scale for the classical ELM model (Equations 11 to 13).
 Apply the WT approach to the selected inputs for the W-EELM model.
 Determine the number of hidden nodes and assign the ELM and W-EELM hidden layer weights and biases.
 Normalized the input and output variables (data training and data testing).
 Calculate the H matrix for both models according to Equation 2 Activate the H matrix using the hyperbolic tangent sigmoid transfer function [46].
 For the classical ELM, compute the output weighting using SVD and the COD algorithm for W-EELM, and then perform the prediction.
 Denormalize the predicted and actual CO2.
 Select the best models based on various statistical criteria, as shown in the following section and comparable plots.

Performance Measures
Each predictive model developed in this study was evaluated using several performance measures (see Equations 14 to 21).The statistical measures are correlation coefficient (R), Nash-Sutcliffe efficiency (NSE), mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), relative error (RE), mean absolute relative error (MARE), and residual.The mathematical expressions of these metrics can be written as follows [47][48][49][50]:

Result and Discussion
The ability of the wavelet-enhanced extreme learning machine (W-EELM) and the classical extreme learning machine (ELM) modeling approaches to predict CO2 emissions over Mauna Loa Observatory Station in the United States is presented.In this study, three different time scales (weekly, monthly, and yearly) were selected to predict CO2 gas emissions one step ahead based on previous data points.The short-term (weekly) prediction is important for developing advanced early warning systems, while the medium-term (monthly) prediction is critical for evaluating overall mitigation strategies and reducing CO2 concentrations.In addition, the long-term (yearly) data can help with global management, policies, and strategies for countries exporting more CO2, as well as monitoring the improvements achieved.
The reason for modeling short-and long-term predictions is to intensively study the behavior of the proposed models in capturing the patterns of CO2 gas emissions over different time series.In general, the analysis of time series data becomes more complex over time as some of the key data features are lost due to averaging.Therefore, the qualifications of a reliable model are effectively considered in short-and long-term predictions.In this study, three different input combinations (m1, m2, and m3) were introduced in the previous section (see Equations 11 to 13) and used to develop the models for one-step ahead prediction of CO2.For the W-EELM models, the input data were denoised using the wavelet transform (WT) to obtain clean and stationary data.These data were fed into the input nodes of the EELM approach.It is worth noting that Daubechies (db) was used as the wavelet mother and that two different decomposition levels were used for each time scale (weekly, monthly, and yearly).
In this study, 60% of the data set was used to build the model, and the rest was used for testing purposes.Various statistical matrices and graphical representations were used to evaluate the predictive quality of each model.Table 3 lists the statistical evaluation matrices for each predictive model in the training and testing phases.According to the quantitative analysis, several prediction models show good performance in predicting the weekly CO2 concentrations in the training phase.The statistical measures such as RMSE, MAE, MAPE, and NE ranged from 0.4575 to 0.9893 (ppm), 0.3699 to 0.8074 (ppm), 0.1058% to 0.2325%, and 0.9936 to 0.9986, respectively.The Wm2-WEELM 3 model with only two input variables (m2) and three decomposition levels showed the highest prediction accuracy compared to other similar models.However, the training phase cannot provide a meaningful impression of the most reliable modeling approaches because the trained models are based on label input data.Therefore, the testing phase is crucial for the selection of a reliable predictive model because unlabelled input data is fed to the model, which can accurately determine the generalizability of the model at this stage.The superiority of the Wm2-WEELM 3 model was shown in comparison to other comparable models, which had the lowest values for RMSE (0.5899 ppm), MAE (0.4745 ppm), MAPE (0.1206%), and the highest value for NE (0.9975).In addition, the accuracy of predictive models decreases when the number of input variables is increased because unnecessary information is added, which can hinder the training process of the model and decrease the generalization efficiency.This phenomenon can be clearly demonstrated by the Wm3-WEELM 4 model, where 15 input variables are introduced into the system, which complicates the mathematical operations of the model and reduces the prediction accuracy to the lowest level.Regarding the prediction of the monthly time scale of CO2, the accuracy of the predictions varies from one model to another.The statistical metrics are tabulated in Table 3, and the values of RMSE, MAE, MAPE, and NE ranged from 1.0967 to 2.587 (ppm), 0.8681 to 2.1455 (ppm), 0.2601% to 0.6428%, and 0.95658 to 0.9938 separately.In this phase, the Mm1-WEELM 2 model had the highest accuracy, while the Mm2-WEELM 3 model (with eight input variables) had the lowest prediction accuracy compared with the other eight similar models.However, in the phase test, the Mm1-WEELM 2 maintained its excellent prediction performance and provided the lowest RMSE of 1.501 ppm, MAE of 0.929 ppm, MAPE of 0.2421%, and the highest value of NE (0.9943).
From Tables 3 and 4, it can be seen that both models are sensitive to the number of hidden nodes and input combinations.For the weekly scale, the best model is Wm2-WEELM 3 .The model requires two raw input variables to achieve the best accuracy in this case.However, for the monthly and yearly time scales, the model needs only one raw input variable.Another important observation is that the models for predicting weekly CO 2 generally require a higher number of hidden models than for the monthly and yearly time scales.This could be related to the nature of CO2 records, where data are fluctuating and scattered on a short time scale, while fluctuating on the medium and long-term scales.Wm1-WEELM 3  12   Wm1-WEELM 4  4   Wm2−ELM 7 Wm2-WEELM 3 25 Wm2-WEELM 4  8   Wm3−ELM 4 Wm3-WEELM 3
Finally, Table 3 lists the quantitative assessments of nine predictive models used to predict one year ahead.The statistical parameters of all predictive models for the training phase ranged from 0.2747 to 0.5039 ppm, 0.1975 to 0.3936 ppm, 0.0585% to 0.1192%, and 0.9986 to 0.9969 for RMSE, MAE, MAPE, and NE, respectively.The superiority of the Ym1-WEELM 1 model in predicting CO2 yearly emissions over the other eight predictive models was clearly demonstrated in both the training and testing sets.Moreover, with only two input variables, this model achieved lower values for RMSE (0.4674 ppm), MAE (0.3783 ppm), and MAPE (0.0983%) and the highest value for NE (0.999) during the testing set.The superiority of the proposed modeling approach was demonstrated by the reduction of RMSE criteria during the training and testing phases (see Figure 4).Thus, the proposed model has shown a significant improvement in prediction compared to the classical model in the yearly time interval with 13.65% during the testing phase, as shown in Figure 4.However, the percentage performance improvement varies from 3.25% to 11.67% for the weekly and monthly time scales, respectively.The main observation is that the highest improvement occurs when predicting CO2 yearly, followed by the monthly and weekly time scales.The explanation for this phenomenon is that the accuracy of long-term predictions using conventional models (e.g., ELM) decreases significantly because these approaches cannot effectively capture the dynamics of the environmental system.In addition, the dynamic trend of the time series data is lost when larger time scales are used and the observed data points are averaged.
To further investigate the adequacy of each predictive model, various graphical representations were created, including scatter plots, relative error counts, and boxplots.The scatter plots for the training and testing phases, shown in Figures 5 through 10, serve as a better means for comparative evaluation of the classical and hybrid modeling approaches.In addition to the correlation coefficient (R), these figures show a more meaningful visualization of the variance between the observed and predicted values of CO2 gas emissions.Regarding the weekly time scale, the graphical representations (Figures 5 and 6) showed that the hybrid models with two decomposition levels achieved the best prediction accuracy compared to the classical models.Moreover, these figures provided a deep understanding of how each model predicted the scattering values around the ideal line.For example, the Wm2-WEELM 3 exhibited lower scatter and achieved the highest prediction efficiency with R of 0.994 and 0.991 for the training and testing phases separately.
Similarly, the standard models based on Figures 7 and 8 for the monthly time scale had lower prediction accuracy and relatively more scattered points than the hybrid models.It is obvious that the Mm1-WEELM 2 model is superior to all other predictive models and achieves perfect accuracy with R between 0.9969 and 0.9972 for the training and testing phases.Finally, the general situation regarding the yearly predictions for CO 2 emissions has not changed significantly, as the hybrid models show great ingenuity and high prediction accuracy with the highest R values for both phases compared to the classical models (see Figures 9 and 10).The Ym1-WEELM 1 model exhibited lower scatter and deviation from the ideal line, resulting in a higher R-value (0.9998 to 0.9996) for the training and testing series, respectively, compared to other predictive models.It is worth noting that the complex models, especially the hybrid models that require many inputs, did not predict the weekly, monthly, and yearly CO2 gas emissions very well.The RE value is calculated for each observation during the testing phase to provide a much more meaningful graphical assessment of the prediction error.In addition, the RE value can provide a meaningful and clear explanation of the ability of the predictive models to predict CO2 gas emissions on different time scales.The results presented in Figures 11 to 13 show that for each time scale (i.e., weekly, monthly, and yearly), the hybrid models provide less RE and MARE than the classical prediction approaches, except for the models that require the highest decomposition level.For example, the weekly CO2 prediction models provided the lowest value of RE, with only one observation just above 0.5%.The classical models, on the other hand, provided approximately larger values of RE and a number of observations above the benchmark of 0.5%, ranging from 3 to 10.In addition, the Mm1-WEELM 2 and Ym1-WEELM 1 models have much lower relative error in the other predictions at monthly and yearly time scales compared to the traditional models.Finally, the best MARE criterion of all proposed models was found to be 0.1206%, 0.2421%, and 0.0983% for weekly, monthly, and yearly time series forecasts, respectively.The boxplot was created for all the models used in this study to give a more informative and clear overview of the outliers and robustness of the models during the testing phase [51].Figures 14 to 16 show the boxplots of the residuals between the actual and predicted CO2 levels for all models.As Figure 14 shows, all predictive models for weekly scale had higher outliers than Wm1-WEELM 3 , Wm2-WEELM 3 Wm3-WEELM 3 models.However, the Wm3-WEELM 3 model shows a tendency toward the third quartile, while the other models show smaller interquartile range (IQR) errors.As for the monthly time scale, only three models (Mm1-WEELM 2 , Mm2-WEELM 2 and Mm3-WEELM 2 ) had smaller residual error among all nine models, as shown in Figure 15.In addition, Mm2-WEELM 2 model showed a slight trend toward the third quartile.Finally, for the yearly estimates, the following models (Ym1-WEELM 1 , Ym2-WEELM 1 , and Ym3-WEELM 1 ) performed best among all nine models, as shown in Figure 16.The Ym2-WEELM 1 and Ym3-WEELM 1 models had the highest range of suspected outliers (1.5 IQR).

Conclusion
Reliable predictions can help decision makers to take effective actions to reduce CO2 concentration.In this work, the efficiency of W-EELM and the classical model ELM in predicting CO2 emissions at different time scales (i.e., short-, medium-, and long-term) is investigated.The main contribution of this study is to improve the predictive capacity and stability of the classical model ELM by denoising the input data using the technique WT and using the algorithm COD instead of SVD to calculate the weights of the hidden layer of ELM more accurately.The CO2 data were collected from Mauna Loa Observatory Station, 60% of the data sets were used for training and calibration, and the rest (end of the time series) was used for model accuracy testing.The predictive capacity of the two models was evaluated using various statistical metrics.The superiority of the proposed model capacities over the classical models was clearly shown in the reduction of the value of the RMSE criterion in the testing phases.The greatest improvement was obtained in the prediction on the yearly time scale (16.65%), followed by the monthly (11.67%) and weekly (3.25%).

Figure 4 .
Figure 4. Superiority of W-EELM models over classical ones in reduction RMSE values

Figure 11 .Figure 12 .Figure 13 .
Figure 11.Distribution of relative error for each predictive model during the testing phase: weekly time scale

Figure 14 .Figure 15 .Figure 16 .
Figure 14.The box plots graphical presentation over the testing phases for all applied predictive models: weekly

Table 2 . statistical criteria
*N and CV respectively are the number of observations and variance coefficient.

Table 3 . Performance prediction capabilities for predictive models in training and testing phases
The bold fonts represent the best model accuracy; the symbol * represents the wavelet decomposition level; m1, m2 and m3 are the input variables as shown in the methodology section; and W, M, and Y refer to the time scale (i.e., weekly, monthly, and yearly). Note: