Indoor Climate Prediction Using Attention-Based Sequence-to-Sequence Neural Network

The Solar Dryer Dome (SDD), a solar-powered agronomic facility for drying, retaining, and processing comestible commodities, needs smart systems for optimizing its energy consumption. Therefore, indoor condition variables such as temperature and relative humidity need to be forecasted so that actuators can be scheduled, as the largest energy usage originates from actuator activities such as heaters for increasing indoor temperature and dehumidifiers for maintaining optimal indoor humidity. To build such forecasting systems, prediction models based on deep learning for sequence-to-sequence cases were developed in this research, which may bring future benefits for assisting the SDDs and greenhouses in reducing energy consumption. This research experimented with the complex publicly available indoor climate dataset, the Room Climate dataset, which can be represented as environmental conditions inside an SDD. The main contribution of this research was the implementation of the Luong attention mechanism, which is commonly applied in Natural Language Processing (NLP) research, in time series prediction research by proposing two models with the Luong attention-based sequence-to-sequence (seq2seq) architecture with GRU and LSTM as encoder and decoder layers. The proposed models outperformed the adapted LSTM and GRU baseline models. The implementation of Luong attention had been proven capable of increasing the accuracy of the seq2seq LSTM model by reducing its test MAE by 0.00847 and RMSE by 0.00962 on average for predicting indoor temperature, as well as decreasing 0.068046 MAE and 0.095535 RMSE for predicting indoor humidity. The application of Luong's attention also improved the accuracy of the seq2seq GRU model by reducing the error by 0.01163 in MAE and 0.021996 in RMSE for indoor humidity. However, the implementation of Luong attention in seq2seq GRU for predicting indoor temperature showed inconsistent results by reducing approximately 0.003193 MAE and increasing roughly 0.01049 RMSE.


Introduction
In 2018, the Food and Agriculture Organization of the United Nations (FAO) attempted to avoid the future world's hunger crisis with the motto "No Food Loss and Food Waste" [1].The developments of some technologies, such as Artificial Intelligence (AI), cloud computing systems, and the Internet of Things (IoT), impacted the change in the agriculture industry in Indonesia, which was supported by the Indonesian Agriculture Ministry by promoting Agriculture 4.0 [2].The Solar Dryer Dome (SDD) is a part of the Agriculture 4.0 programs that can support farmers in evaporating their agricultural commodities for sustainability or food processing.Compared to the traditional way of drying Although SDD is built using sophisticated technologies, SSD still has some problems pertinent to electricity consumption.The first problem is that the differences in solar light intensity based on geographical conditions depicted by the latitude line can affect solar power absorption [4].The second problem is the bad weather conditions, which cause solar panels to be unable to absorb solar energy all day.The use of heaters for raising the inner temperature and dehumidifiers for improving indoor humidity is the largest power consumer for SDD [5].To reduce energy consumption, attain great drying efficiency, excellent goods quality, optimum temperature, and appropriate air circulation, SDD needs to optimize the activities of its actuators, such as heaters, dehumidifiers, and fans, by implementing actuator scheduling, which relies on the results of indoor climate prediction [6].This research discussed indoor climate prediction methods that can be implemented for SDD.
Another insight in the agriculture sector is that the current research in IoT-based smart greenhouses indicates that more than 50% of the cost of greenhouse expenditures comes from energy consumption and labor costs [7].The smart greenhouse also has the same problems as the SDD, especially in reducing energy consumption.In a previous study, researchers managed to implement an IoT scheme-based automation in a smart greenhouse, which significantly impacted its energy efficiency [7].This research may bring new considerations to implementing AI-based indoor climate prediction systems to further optimize the automation of actuator scheduling, which can reduce energy consumption.
Different construction sites for SDDs or greenhouses can have significantly different needs in terms of local energy sources, environmental climatic conditions, and agricultural demands [4].Some SDDs may need to be constructed in extremely remote areas where environmental conditions can change quickly and dramatically.This condition became a particular concern in this research, which is why datasets with complex patterns may be beneficial in training environmental climate prediction models.To represent the SDD and greenhouse, this research used a complex dataset called the Room Climate Dataset, which is composed of various Pearson Correlation Coefficient (PCC) values [8].This challenging dataset intuitively encourages this research to investigate the great fame of deep learning approaches for predicting sequence-to-sequence cases by implementing the Luong attention mechanism.
Deep learning, an approach that enables multi-level representations of data and their distribution to be learned, can be adopted for indoor environmental forecast purposes [9].The Recurrent Neural Network (RNN) can be illustrated as a neural network specialized for modeling time series data [10].That is the reason why this research implements RNN for modeling multivariate time series data containing indoor climate variables, especially LSTM, the prominent type of RNN, and GRU, the abridged adaptation of LSTM [11].The use of deep learning approaches has been deemed appropriate for analyzing the complex pattern in the Room Climate dataset.This research implemented sequence-tosequence (seq2seq) RNN-based prediction models for indoor climate prediction in sequence-to-sequence cases.Seq2seq, an architecture prominent in machine translation, is a deep learning approach that can process output and input in the form of sequence data [12,13].This research proposed Luong attention-based seq2seq models with LSTM and GRU.
Attention mechanisms have been successfully applied in a wide variety of deep learning application domains, although further research is required [14].Originally implemented in NLP, the attention method with self-attention can also be applied to other tasks, such as computer vision, to attain performance improvement.This research aimed to investigate the impact of attention mechanisms, especially Luong attention, on predicting time-series data.To the best of our knowledge, this is the first comparative study of the Luong attention application in seq2seq models for indoor climate prediction.This research contribution is about evaluating the usage of Luong attention mechanism by comparing our two proposed models to two simple seq2seq and two adapted baseline models.Our motivation for experimenting with Luong attention-based mechanism in our time-series forecasting research was because of the excellence of the attention mechanism in NLP research by focusing on the only pertinent information in the word data set, which intuitively may have a positive impact on time-series research in memorizing long sequences [15][16][17].

Literature Review
The most similar research to ours was done by Gunawan et al., in which they evaluated four deep learning models, which included a two-layered LSTM model, a two-layered GRU model, a Transformer model, and a Transformer model with learnable positional encoding applied to the room climate dataset [8,18].Unlike this research, which predicted a sequence output of five timestamps in the future, the study conducted by Gunawan et al. predicted only one timestamp in the future, resulting in extremely strong  2 scores.Their research showed that both LSTM and GRU were able to contend against each other in terms of forecasting accuracy.
In the agriculture field, there was research conducted by Liu et al. that implemented LSTM in their proposed model called GCP_LSTM for predicting indoor climate variables inside greenhouses [19].Their proposed model was built for controlling the indoor climate to ensure the stable growth of some crops, such as tomatoes, cucumbers, and peppers.
The results of their experiments showed that the proposed GCP_LSTM outperformed RNN and GRU.Another similar study investigating an LSTM model for monitoring indoor conditions, such as air temperature, relative moisture, pressure, wind, and dew point in a smart greenhouse was done by Ali et al. [20].Their research aimed to find the best configuration between various hyperparameter settings, such as optimization algorithms and the number of neurons.
As the origin of deep learning architectures, the reliability of ANN was still investigated in the research done by Ullah et al. for predicting the indoor environment in a smart greenhouse [21].When compared to the conventional Kalman filter algorithm, their research showed that implementing ANN can increase the precision of indoor prediction using the Kalman filter method by reducing the RMSE by around 23.07% for temperature, 43.17 for CO2, and 44% for humidity.Recently, there has been a research project in agronomy related to predicting indoor air temperature inside the greenhouse using outdoor data with machine learning approaches [22].They believe that the greenhouse is not fully isolated from the outside since the inside air is impacted by the outdoor climate.Their study compared machine learning models such as multiple linear regression, an ensemble of trees, support vector machine-based regression, and Gaussian process regression using data gathered from main weather data and indoor temperatures inside a greenhouse in Agadir, Morocco.Their research concluded that the Gaussian Process Regression outperformed all other machine learning models, even though the computational time of the training process was relatively higher.
Other research related to comparisons between LSTM and GRU in predicting indoor climate was done by Elhariri & Taie [23].They developed microclimate condition prediction models based on LSTM and GRU that were trained using the UCI ML Repository SML2010 dataset, which were aimed at forecasting future indoor environmental conditions.The findings indicated that the GRU outperformed the LSTM.The study done by Chen et al. was not related to indoor climate but can still be considered notable because of the discussion of LSTM and GRU for predicting environmental conditions [24].Their research using an annual national key R&D project grain cloud platform dataset showed that the GRU prediction result exceeded the LSTM prediction result, even though their research initially proposed the LSTM model.Inspired by this competition between LSTM and GRU, this research was designed to implement both LSTM and GRU.
According to the research done by Fang et al., deep learning with the Seq2seq architecture performed well in timeseries data, particularly for indoor climate prediction [25].With the dataset obtained from GreEn-ER, a smart building containing indoor temperature and CO2 concentration, their research studied prediction models by proposing three encoder-decoder models, which were the combination of LSTM layers and dense layers, such as LSTM as encoder and dense layer as decoder, LSTM applied in both encoder and decoder, and a dense layer applied between LSTM layers as both encoder and decoder.Their results showed that LSTM-dense was the best for their case.Their research stated that RNN models such as LSTM and GRU were commonly used for forecasting indoor climate, whereas the seq2seq architecture models were rarely used.This statement prompted this study to look into the deep learning approach using the seq2seq architecture as well.They also discovered that large-capacity models, such as seq2seq, can overfit quickly to their training data.
The seq2seq deep learning architecture, which is popular for machine translation tasks, has become a curious thing to be explored further for this research.Our previous research compared both simple seq2seq models and adapted baseline models by showing that simple seq2seq models were superior to the adapted baseline models in predicting indoor climate with the room climate dataset [26].This research further explored the potential of the seq2seq architecture by implementing Luong attention, then compared it with simple seq2seq and adapted baseline models.

Datasets
The Room Climate Dataset, which was acquired from the indoor climate experiment conducted by Morgner et al. [8], is publicly available on GitHub and was used in this research.This research continued the previous research done by Gunawan et al., which used the same dataset [18].This dataset is suitable for deep learning experiments as the amount of time-series data is huge [8,27].The variables used from the Room Climate Dataset in this research were Temp (indoor temperature in Celcius), Relh (relative humidity), L1 (light sensor 1 in nanometer wavelength), and L2 (light sensor 2 in nanometer wavelength).The dataset is illustrated in Figure 1, with an axis describing timesteps with 273,144 timesteps and  axis describing the value of each variable.The Pearson Correlation Coefficient (PCC) was applied to describe the characteristics of a dataset by informing the correlation between each variable in the datasets, with the results illustrated in Figure 1.The calculation of PCC is described as   in Equation 1, where x and y are the two variables in comparison, ̅ and  ̅ are the average values of each comparison variable [29].
Based on Figure 1 and Table 1, the characteristic of the Room Climate dataset in room A has assorted PCC values.The correlation between indoor temperature and humidity is moderate.Both light 1 and light 2 also have moderate correlations to the indoor temperature variable, but both light 1 and light 2 have negatively negligible correlations to the indoor humidity variable.

Long Short-Term Memory
Long Short-Term Memory (LSTM) is an impressive RNN, which can tackle vanishing gradient problems when learning long-term computations [31].LSTM is appropriate for addressing time-series forecasting and can also tackle issues depending on temporal memory.LSTM is structured from three computations, namely the input, output, and forget gate.
The input gate in LSTM is computed as in equation (2), where the result of the input gate is symbolized as  () .From Equation 2,  () ,  (−1) , and  (−1) are the entered data, cell value from the last iteration, and output data from the last repetitive calculation respectively with   ,   , and   as the weight value.The bias or uncertainty value of this gate is symbolized as   .The symbol of  represents the activation function, which is usually sigmoid.
The forget gate in LSTM is computed as in equation ( 3), where the outcome of the forget gate is symbolized as  () .From equation (3),   ,   , and   are the weight value of  () ,  (−1) , and  (−1) respectively.The uncertainty value of forget gate is parenthesized as   .
=  () ⊙  () +  (−1) ⊙  ()  (4) Equation 4 describes the calculation of cell value with the symbol   where block inputs are illustrated with the symbol  () .The output gate in LSTM is computed as in Equation 5, where the result of the output gate is symbolized as  () .From equation ( 5),   ,   , and   are the weight value of  () ,  (−1) , and  (−1) respectively.The unpredictability value of the output gate is symbolized as   .The outcome of LSTM is determined by the equation (6) as  () .

Gated Recurrent Unit
Introduced in 2014, the Gated Recurrent Unit (GRU) is a streamlined LSTM with just two gates, which are the reset gate expressed as   and the update gate expressed as   [32].In many studies, GRU is equivalent to or even outperforms, LSTM [33].
̃=  ℎ (  (  ⊙  −1 ) +     +   ) (7) =   (   −1 +     +   ) =   (   −1 +     +   ) (10) where: the candidate state and output are defined as   ̃ in Equation 7and   in Equation 8respectively; the input data is symbolized as   ; ⊙ is the operation of element-wise multiplication; the weight values of the candidate state, reset gate, and update gate are symbolized as   ,   , and   respectively; the unexpected value of each candidate state, reset gate update gate are respectively denoted as   ,   , and   ; both activation functions with tanh and sigmoid are symbolized as  ℎ and   .

Adapted Baseline Models
The first architecture, which this study compared to our proposed models was the adapted baseline model.The adapted baseline models were reconstructed based on the LSTM and GRU models experimented and discussed by Gunawan et al. [18].This study modified these models to handle sequence-to-sequence scenarios by forecasting humidity and temperature inside the room for the later five timesteps according to data from 150 past timesteps.Since the baseline models cannot deal with 3-Dimensional (3D) data portrayed as (, , ), the input data need to be converted into 2-dimensional (2D) data portrayed as (,  × ), where , , and  represent the amount of data, the number of timesteps, and the number of data features respectively.In the end, the result of adapted baseline models will be in 2D data, which needs to be converted back into 3D data for comparison purposes with other seq2seq models.The adapted baseline architecture models are depicted in Figure 2.

Simple Seq2seq Models
Another architecture, which was examined in this research was a simple sequence-to-sequence (seq2seq) or encoderdecoder architecture.It is an architecture of deep learning which is prominent to be used for Natural Language Processing (NLP) purposes [13].Earlier implementations of seq2seq architecture were used for machine translation purposes from English to French [34].The simple seq2seq architectures investigated in this research contain RNN-based Encoder-Decoder, in which the LSTM layers or GRU layers were implemented into both the encoder and decoder layer.
Figures 3 and 4 depict the implementations of the seq2seq architecture in simple seq2seq models used in this research for comparison purposes [35].It can be seen that specific RNN layers such as LSTM and GRU were utilized in the encoder and decoder layers.Additionally, batch normalization was also adopted to normalize the activation in the model, specifically between the encoder and decoder due to its potency for accelerating convergence in loss plots quickly [36].The repeat vector layers in Figures 3 and 4 were used for repeating the input as many as the output target, which in this research was 5 timesteps.The time-distributed layer applied in this experiment is the same as the dense layer, but it is specialized for 3D tensors.In NLP-related research, the implementation of attentional mechanism had proven capable of improving neural machine translation models [37].This research applied the attentional mechanism by Luong et al. to our proposed models.Based on the research done by Luong et al., there are three different scoring content-based functions as described in Equation 11.
where ℎ  and ℎ ̅  are the latest hidden target state and all hidden states;   and   are the weight value and vector of a respectively.Our proposed Luong attention-based seq2seq architectures are illustrated in Figures 5 and 6.Based on Figures 5 and 6, both proposed models calculated the attention score by using the first function in Equation 11with a dot calculation.This research applied dot calculation using the Keras Dot layer by combining information from two hidden states in the encoder and decoder, then calculating the result of it with Softmax activation.The dot_1 in Figures 5 and 6 is a context vector, which combines the information from the result of attention scoring after the activation process with the information from the encoder's hidden state.This study applied batch normalization to the result of the context vector before it was concatenated with the decoder's hidden state.

Hyperparameter Settings
For fair comparisons, the hyperparameter settings used in this research replicated the settings used in the experiment conducted by Gunawan et al. [18].All the models in this research implemented 128 neurons for the LSTM and GRU layers.The learning rate for all models was set to 0.001 Adam optimization was implemented on all models because of its preeminence, which can conquer sparse gradients like AdaGrad and non-stationary principle like RMSProp [38].It was determined that the batch size and the number of epochs would be 64 and 50, respectively.

Performance Metrics
In time series prediction with machine learning, the most suitable and popular performance metrics for regression cases are Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE) [39].It is suggested to use more than one performance metric because one incorrect metric might lead to an inaccurate prediction of results [40].This study considered the use of MAE and RMSE to calculate the error between predicted results and ground truth data.Due to the capacity to contrast actual and predicted data based on their distribution, the coefficient of determination (R^2) became an important measure to assess.[39].
The calculations of MAE, RMSE, and  2 are described in Equations 12 to14.The prediction result and the ground truth data are symbolized as   and   , respectively.The mean value of ground truth data is symbolized as  ̅ .The value of  2 can be categorized as meaningful, moderate, and weak when the results are ( 2 ≥ 0.75), (0.25 <  2 < 0.75), and ( 2 ≤ 0.25) [41].

Data Preprocessing
Overly high or low values in data observations may lead the models to overfit in learning the data.To overcome that problem, Z-score standardization was implemented in this research to boost the models in understanding the pattern of data [42].

𝑍𝑁(𝑥) = 𝑥− 𝜇(𝑥)
() (15) The calculation of Z-score standardization () is explained in equation (15), where the mean value and the standard deviation are symbolized as () and ().The results of Z-score standardization are depicted in Figure 7. Figure 7 shows that this research separated the dataset into train sets colored in red and green, which is 80% of the original dataset, and test sets colored in blue and orange, which is 20% of the original dataset.The test set is used to compare all the models for comparison purposes.Meanwhile, for training purposes, this research divided the train set into two parts, where 80% of the train sets are training sets and 20% of the train sets are validation sets The sequence-to-sequence case in this research was designed to predict five upcoming timesteps based on 150 prior timesteps, which means that sliding window processes had to be implemented in pre-processing the dataset.If the dataset were imagined as the symbol of (, ), with  interpreting the total number of data and  interpreting the total data features such as temperature, humidity, light 1, and light 2, the result of the sliding window process would be imagined as the symbol of (, , ), with  reflecting the number of tiny partitions in total created in the sliding window process and  denoting the number of small partition sizes based on the input timesteps and output time steps (Figure 8).

Model Training
The train datasets depicted in blue and orange in Figure 7 were used to train all the models.The proportion of training and validation sets was set at 80% and 20% of training sets, respectively.This research utilized the Google Colaboratory PRO+ for the training process, with the loss plot depicted in Figure 9, where the  axis represents the epoch and the  axis represents the loss value in MAE.A glance at the training process shows that the simple seq2seq models with LSTM and GRU had lower loss values than the adapted baseline models.Meanwhile, our proposed model, the seq2seq LSTM with Luong attention, appeared to be the best with the smallest gap between training loss and validation loss.The seq2seq GRU with Luong attention appeared not to be remarkable, but in the evaluation phase, the results were stunning.

Evaluation Result and Discussion
All the models were evaluated on the test set, which is marked with the red and green colors in Figure 7. Tables 2  and 3 summarized the evaluation results and comparison of the models' performance.Prior to calculating the metrics, all the predicted z-score were converted back to the original value ranges.This research only predicted temperature and relative humidity, so the predictions of light intensity were ignored.Compared to the adapted baseline models, the proposed models achieved astoundingly lower MAE and RMSE in both predicting indoor temperature and humidity [18].Such findings may be attributed to the fact that the dataset has varying PCC values, implying that its pattern may be more complex to be modeled and may be more suitable for more complex models.Based on our previous research, when the dataset was dominated by very strong correlation PCC values, the seq2seq models were too complex for the data [35], which is not the case in this study.Another way to look at why our proposed models were superior to the adapted baseline model is that this research brought the experiment scenario to the sequence-to-sequence problem, which is similar to an NLP problem.
The proposed models were also compared to the simple seq2seq models, which did not implement any attention mechanism.The implementation of Luong attention in seq2seq LSTM gave an overall improvement of 0.00847 in MAE and 0.00962 in RMSE for forecasting indoor temperature.It also reduces 0.068046 MAE and 0.095535 RMSE for forecasting indoor humidity.The implementation of Luong attention in seq2seq GRU also improved the accuracy in predicting indoor humidity by 0.01163 in MAE and 0.0219963 in RMSE.But in predicting indoor temperature, the implementation of Luong attention gave inconsistent results by reducing approximately 0.003193 MAE and increasing roughly 0.01049 RMSE.The error reduction in predicting indoor climate may lead to the conclusion that attention mechanisms in the NLP field can also be implemented in time series prediction, which must further be validated using statistical testing.
Tables 2 and 3 showed that all models achieved  2 values below 0.70 in predicting indoor temperature and below 0.86 in predicting indoor humidity.The Room Climate dataset from room A was challenging because the PCC values in the training dataset were different from the PCC values in the testing dataset.To investigate the differences between training and testing data, this research used PCC values on the separated subsets as in Figure 10 for the train set and Figure 11 for the test set.This research conducted a Mann-Whitney U test to verify the impact significance of Luong attention implementation on the model based on the test results.The Mann-Whitney U test, which is also known as the Wilcoxon rank-sum test or the Wilcoxon-Mann-Whitney test, is a nonparametric test to verify the null hypothesis that both independent groups come from populations with the same distribution [43].All seq2seq models were compared to the seq2seq models with Luong attention.This research defined that the null hypothesis statement was that there is no impact from the implementation of Luong attention and the alternative hypotheses statement was that there is an improvement from the implementation of Luong attention.This result decided that the threshold  was set to 0.05 [44].The results of hypothesis testing are depicted in Table 4 showed that all testing results were below the threshold value, which interpreted that the implementation of Luong attention significantly impacted the accuracy of seq2seq models.The closest P-value score to  value was the impact of the implementation of Luong attention on seq2seq GRU, which explains why loss values in the training process for seq2seq GRU with Luong attention were not as low as the loss values for seq2seq LSTM with Luong attention.Based on the results of the experiments, the Seq2seq architecture, which was originally developed for NLP cases, has been shown to be applicable in time series prediction.Furthermore, the implementation of the Luong attention mechanism enhanced the seq2seq model's performance to predict time series data more accurately, which has been proven with hypotheses testing.
The implementation of Luong attention mechanism on seq2seq architecture models also has consequences, such as increasing model runtime during the model training process and making the models more complex.This research is not aimed to find the best model between LSTM and GRU, but to investigate the improvement of the models after the implementation of the Luong Attention mechanism by comparing them to the simple seq2seq models and adapted baseline models.

Conclusion
For the development of SDD and greenhouses in agricultural fields, this research contributed by experimenting with a complex indoor climate dataset named the Room Climate Dataset, which contained time-series data, and implementing deep learning models for time-series data.The results showed that our proposed models outperformed the adapted baseline models and the simple seq2seq models on sequence-to-sequence cases by predicting the next five timesteps based on 150 prior timesteps.The implementation of Luong attention in seq2seq architectures has proven to be capable of improving their accuracy, and the results are supported by statistical hypothesis testing using the Mann-Whitney U test.This research focused on giving new ideas about AI-based indoor climate prediction.As a result, the implementation impact of proposed AI models for energy consumption reduction needs to be investigated in future research for realworld application at SDD or greenhouse facilities.
The content scoring illustrated in Equation 11 based on the Luong attention mechanism has three different calculations, but this research was limited by only investigating the first calculation using dot scoring.So there are chances for future observations using general and concept calculations.For future research, the real complex indoor dataset from SDD will be investigated, which may contain indoor climate data such as temperature and humidity from outdoor environments, as well as information about treatments for the contained food product such as microstructural features and microscale properties related to the physical product.For fair comparison purposes, this research implemented the same hyperparameter settings.For the following research, hyperparameter optimization may be implemented to determine optimal hyperparameter values by performing a grid search or random search approach.The Seq2seq architectures popularly used for NLP purposes have been investigated to also be suitable for processing time series data, which may imply that other NLP models such as Transformers can be investigated for processing time-series data as well in future studies.

Figure 1 .
Figure 1.Room Climate Dataset (left) and the PCC values in the dataset (right)

Figure 2 .
Figure 2. The customized baseline models with LSTM (a) and GRU (b)

Figure 7 .
Figure 7.The implementation results of Z-Score standardization of dataset (left) and splitting illustration (right)

Figure 9 .
Figure 9. Training Process in MAE

Figure 10 .
Figure 10.PCC values on the train set