Predicting the Earthquake Magnitude Using the Multilayer Perceptron Neural Network with Two Hidden Layers

Because of the major disadvantages of previous methods for calculating the magnitude of the earthquakes, the neural network as a new method is examined. In this paper a kind of neural network named Multilayer Perceptron (MLP) is used to predict magnitude of earthquakes. MLP neural network consist of three main layers; input layer, hidden layer and output layer. Since the best network configurations such as the best number of hidden nodes and the most appropriate training method cannot be determined in advance, and also, overtraining is possible, 128 models of network are evaluated to determine the best prediction model. By comparing the results of the current method with the real data, it can be concluded that MLP neural network has high ability in predicting the magnitude of earthquakes and it’s a very good choice for this purpose.


Introduction
Since ancient times, in the wake of natural events and disasters, man has always been looking for ways to prevent or control these events.The earthquake is one of these natural disasters which cause heavy losses of life and property, when it occurs.Time, location and magnitude of the earthquake are three parameters that must be a good estimate of their amounts in order to control and minimize its losses.Hence, scientists and researchers have done attempts, including many successful and unsuccessful ones, to find a relationship between these three parameters, or make a good estimation of them.
These efforts have resulted in developing a number of theoretical and empirical equations.However, applicability of equations developed for calculating the magnitude of earthquakes is affected by a lot of parameters.Most of these parameters need to be measured and entered in the equations accurately, while, in many areas, due to the lack of required equipment, these parameters mostly are measured approximately and with low precision or even sometimes assumed.Also, some parameters of the equations such as physical and functional characteristics of faults are difficult to measure.For example, geodetic strain rate of reverse faults with no apparent sign of fault strike on the earth surface is not measurable.Moreover, these equations usually are exclusive of a specific region or state, so they are not reliable enough for other new regions.
On the other hand, neural networks have been proven to be one of the most practical effects in modelling and forecasting [1].There are three major advantages of neural networks.First, neural networks are able to learn any complex non-linear mapping.Second, they do not make a priori assumption about the distribution of data.Third, they are very flexible with respect to incomplete, missing and noise data and therefore eliminate concern about this issue [2].Moreover, neural networks, regardless of the region and country, are a general solution in all areas.
Neural networks have been used successfully to solve complicated pattern recognition and classification problems in different domains such as image and object recognition [3], optimization and nonlinear programming [4], construction engineering [5] and [6], video and audio analysis [7], and financial forecasting [8].
Also, related to the applying Artificial Neural Networks (ANN) in the problems of predicting the earthquake, there are some conducted researches.Suratgar et al [9] considered the variation of geomagnetic field declination, horizontal component and hourly relative humidity, temperature ground, rainy rate per day to predict magnitude of earthquake 2 days before the occurrence of earthquake by using a neural network.In order to predict the magnitude of the serious earthquake in future time in a seismic area, the probabilistic neutral network was established by Huang Sheng-zhong [10] depending on mathematically computed parameters known as seismicity indicators.Moustra et al [11] evaluate the performance of ANN in predicting earthquakes occurring with considering two different case studies which are the prediction of the earthquake magnitude (M) and the magnitude of the impending seismic event following the occurrence of pre-seismic signals.Alexandridis, A et al [12] presented a novel scheme for the estimation of large earthquake event occurrence based on radial basis function (RBF) neural network (NN) models.Feiyan Zhou and Zhu Xiaofeng [13] used the strong fault-tolerant ability and fast velocity of prediction of the neural network to predict the earthquake and were able to obtain good prediction the LM-BP algorithm.
In this paper, a type of neural network system named Multilayer Perceptron (MLP), which is one of the most influential neural network models, is used to predict the magnitude of the earthquakes.Because of neural networks satisfactory history in the prediction of a variety of parameters in different fields, which is briefly mentioned in previous sentences, it was anticipated that this method would help to develop a prediction model to identify the level of magnitude of earthquakes.
For reviewing the novelty of this research it should be mentioned that, in this paper, the issue of predicting the earthquake magnitude with finding a pattern, by use of neural networks, in the past earthquakes is discussed.A new earthquake prediction system is presented which is based on the application of the artificial neural networks is Iran, one of the countries with large seismic activity.Historical earthquakes dataset of Iran is also considered.The declustering is done before run to find more exact result according to real earthquake magnitudes.The real datasets are used for either training and testing of the model.The input variables in the present study are completely different from other studies.It is tried to choose simplest and the most available variables.Model also can predict earthquake magnitude in an arbitrary time after an earthquake, which this fiture is different from previously presented model too.The high success rate supports the suitability of applying method in this field of study.Details of performance of the predicting methods shows that the neural network model provides good forecasting accuracy.
The method and results are presented in the following sections.In Section 2, Multilayer perceptron neural network is described.In Section 3, the variables which are included in the modelling are specified.MLP network prediction results are presented and discussed in Section 4.

Multilayer Perceptron Neural Network Modeling
All The model neurons, connected up in a simple fashion, were given the name "perceptron" by Frank Rosenblatt in 1962.He pioneered the simulation of neural networks on digital computers, as well as their formal analysis.
In a large number of complicated math problems, where their solution depends on solving tough non-linear equations, MLP neural networks are very helpful, and can easily be employed by defining proper weights and functions.MLP neural networks consist of several layers of nodes.It includes an input layer, an output layer, and a hidden layer, each of which contains input node(s) which are called sensory, output node(s) which are called responding nodes, and hidden node(s), respectively.
The multilayer perceptron neural networks can be built and used with arbitrary number of layers.However, it can be proven that a three layer perceptron is capable of modelling any problem adequately.This fact is referred to as Kolmogorov theorem and is a fundamental concept in neural networks modelling [14].
The neural network which is used in this research consists of two hidden layers.Each hidden layer contains unobservable network node(s).Each hidden node is a function of the weighted sum of the inputs.The function is the activation function, and the values of the weights are determined by the estimation algorithm.The tangent hyperbolic activation function is used for the hidden layers.This activation function takes real-valued arguments and transfers them to the range (-1, 1): For the output layer, softmax activation function is used.It takes a vector of real-valued arguments and transforms it to another vector whose elements fall in the range (0, 1) and their sum equals to 1.

Training Methods
The training method specified how the network processes the records.There are three common methods of training which can be described as following: Batch training: This type of training updates the synaptic weights only after passing all training data records.This means that batch training uses information from all records in the training dataset.This method is often preferred because it directly minimizes the total error.On the other hand, batch training may need to update the weights many times until one of the stopping rules is met and consequently may need many data passes.It is most useful for "smaller" datasets [15].
Online training: This type of training updates the synaptic weights after every single training data record.This means that online training uses information from one record at a time.This method continuously gets a record and updates the weights until one of the stopping rules is met.If all the records are used once and none of the stopping rules is met, then the process continues by recycling the data records.Online training is superior to batch training for "larger" datasets.This means that if there are many records and many inputs, and their values are not independent of each other, online training can more quickly obtain a reasonable answer than batch training [15].
Mini-batch training: This type of training divides the training data records into groups of approximately equal size, and then updates the synaptic weights after passing one group.This means that mini-batch training uses information from a group of records.Then the process recycles the data group if necessary.Mini-batch training offers a compromise between batch and online training, and it may be best for "medium-size" datasets [15].

Optimization Algorithm
This is the procedure used to estimate the synaptic weights and in the current paper following two algorithms are used:  Scaled conjugate gradient: The assumptions that justify the use of conjugate gradient methods apply only to batch training types, so this method is not available for online or mini-batch training.
 Gradient descent: For the current modelling, gradient descent algorithm is used for online training method.

Layers and Nodes
The input nodes are based on some variables.In the current research, six independent variables including three spatial variables (latitude, longitude, depth), one time variable (days), and two variables related to physical characteristics (soil type, fault mechanism) are defined.The output nodes of neural networks are the prediction outputs or labels.In MLP systems it is essential to categorize dependent variable(s) into some branches.So, magnitude of earthquakes is categorized in four classes.These classes are indicated by A, B, C, and D, which represent 4-5, 5-6, 6-7 and bigger than 7 Richter, respectively.
In the hidden layer, as there is no method to decide the optimal number of hidden nodes directly, four different numbers of hidden nodes, including 8, 12, 16 and 20, are chosen for each layer.Moreover, a well-known concern with neural networks is ''overtraining".To ease this problem, Roiger and Geatz [16] suggest that the experiments could be continually conducted by different parameters.Therefore, we use a set of four different learning epochs, including 1, 2, 4 and 8. Furthermore, in training part, batch and online training methods are applicable.In order to reach to more comprehensive results, both of these methods are applied.As a result, we setup 128 different groups of parameters and form 128 models.A simple schematic of a multilayer perceptron network with 2 hidden layers is depicted in figure 1.

Defining the Variables
The seismic data that have been used in the current research are got from the whole instrumentally recorded earthquakes occurred in Iran from International Institute of Earthquake Engineering and Seismology (IIEES) ground motion data base [17].After revising and declustering the data of catalogue and omitting aftershocks and foreshocks, using Uhrhammer method [18], 11000 earthquake events were remained for consideration.In the field of structure engineering, earthquake magnitude upper than 4 Richter are more important, so the events lower than 4 are eliminated from catalogue.For better training of neural network, areas and faults with lower than 3 events were omitted, and finally 4099 event are used in research.
For each event 7 different parameters are defined.6 of these parameters are independent variables which comprise input variables.The other one (7th parameter) is the dependent variable which is the output of network.Input parameters consist of 3 spatial variables which are related to spatial features of earthquakes, one time variable, and two variables related to faults characteristic (including soil type and fault mechanism).

Spatial Variables
Longitude, latitude and depth of earthquake are three parameters that are allocated to each event.For earthquakes with an unknown depth, their depths are assumed to be 33 km.

Time Variable
This variable for each event is the time (days) between that event and the previous one in a specific fault.This variable also indicates stored strain energy in a fault.It should be mentioned that the periods of time used in the calculations are extracted from data base before eliminating the events whose magnitudes are less than 4 Richter.This is because occurrence of these small magnitude events affects the stored energy of the fault.Since for the first recorded event of any region, this variable cannot be calculated, because there is not a previous record, the variable is assumed.This assumption is made by considering the magnitude of the first event and looking for the same magnitudes in the later times.So the average time variable of the later events with the same magnitude of the first event is assumed as its time variable.

Faults Characteristics
Soil type and Fault Mechanism are two parameters related to faults characteristics, which are described below: Soil Type: Soil type of occurrence zone of each earthquake is entered based on Iranian Code of Practice for Seismic Resistant Design of Buildings (Standard No. 2800) in four groups [19], as Table 1.

Table 1. Soil profile classification [19]
Soil Type V (m/sec) Fault Mechanism: Three major classifications of faults (normal, reverse and strike-slip) and their combination can be applied to categorize faults more precisely into eight groups: Normal, reverse, strike-slip left lateral, strike-slip right lateral, normal-strike-slip left lateral, normal-strike-slip right lateral, reverse-strike-slip left lateral, and reverse-strikeslip right lateral.

Output Variable (Magnitude of Earthquakes)
Output variable is the magnitude of earthquakes which has happened, and will be calculated by the MLP method.The results for outputs are divided into four qualitative classes, A, B, C and D. This is because the MLP neural networks are more compatible with qualitative variables and result in more accurate outputs.Magnitude classification is depicted in table 3. From whole data, 85% of them are used for network training, 10% for network testing and revising and the remained 5% are dedicated to derive the final prediction of the magnitudes of earthquakes.Then, these predictions have been compared with real values to assess the network prediction ability.

Declustering Method
Seismicity declustering, the process of separating an earthquake catalog into foreshocks, main shocks, and aftershocks, is widely used in seismology, in particular for seismic hazard assessment and in earthquake prediction models.The goal of seismicity declustering is to separate earthquakes in the seismicity catalog into independent and dependent earthquakes.Aftershocks, which are dependent earthquakes, cannot be distinguished by any particular, outstanding feature in their waveforms.To relate an aftershock to a main shock therefore requires considering a measure of the space-time distance between the two shocks.By doing this, earthquake catalog reduced from 18000.
The process of seismicity declustering starts with a seismicity catalog containing source parameters such as occurrence time, hypocentre or epicentre location, and magnitude.There are several declustering algorithms that have been proposed over the years.But, most users have applied the algorithm of Gardner and Knopof [20], Reasenberg [21] and Uhrhammer [18] mostly, because of the availability of the source codes and the simplicity of the algorithms.In this study the Uhrhammer method is implemented.This method apply the equations ( 1) and ( 2) to declustere the catalog.
Where, M is magnitude of earthquakes and also, d and t are the distance (Km) and time (days) criteria respectively.As mentioned previously, after declustering the data of catalogue, from more than 18000 earthquake events, 11000 events were remained for consideration.

Results and Discussion
As it can be seen in Figures 2 to 5, the average correct prediction of the models is about 72%.The results clarify that both batch and online training methods have good ability of prediction but according to the figures and table 3, MLP neural network with online training method has higher prediction power and maybe this is because a large data base is considered.Also, according to results and as can be seen in table 4, models with 8 units in the first hidden layer have the highest average of correct predictions percentage, and about the second layer, based on table 5, models with 20 units in the second hidden layer have the highest average.So it can be concluded that the optimal number of units in the first hidden layer is 8 units and for the second layer is 20 units.Moreover, about the training period, table 6 shows that models, on average, have their best performance with 8 epochs.Overall, the best prediction is the prediction made by the models M16208-O and M16204-O (Mxxyyz-O/B means model with xx units in the first hidden layer, yy units in the second hidden layer and z epochs with online or batch training method).Table 7 shows the predictions made by model M16208-O for different level of magnitude in training, testing and predicting process, separately.The chosen model with 77.1 percent correct prediction have the highest percent of correct prediction between all the models with online training method.As is clear from Tables 4 and 5, the network has a very high ability to predict the magnitude of class A and also in many cases a high ability to predict the magnitude of class B, but success of the network in predicting the magnitude of the classes C and D is less than the others.This is due to the high frequency of data of classes A and B and low frequency of data of classes C and D. high frequency of data in a class causes the neural network finds ample opportunity to learn and do more accurate predictions, but less frequently in classes not provided the opportunity to learn and network prediction ability decreases.
It is worthwhile to mention that, in spite of low frequency of data in classes C and D (the earthquakes with big magnitude), the network successfully has discovered the relation and similarity of data in some cases and has done correct prediction in those cases.
Another important point is that, at all levels of magnitude, even when the prediction is not properly done, most of magnitudes are predicted one higher or one lower class, and the predictions are near to real magnitudes.
Another output of MLP neural network is independent variables importance, which shows the impact of each independent variable on the earthquakes magnitude.Independent variables importance of model M16208-O is shown in table 9.As it can be seen in this table, and also same table of all the other models, the time variable (days) has highest impact on the magnitude of earthquakes and soil type has lowest impact on the magnitude.

Conclusion
In this paper, the issue of predicting the earthquake magnitude with finding a pattern, by use of neural networks, in the past earthquakes is discussed.
To conclude, according to the results, the MLP neural network is a functional device in predicting the magnitude of the earthquake of a region in an arbitrarily considered time.
The average correct prediction of the models was about 72%.The results clarified that both batch and online training methods have good ability of prediction, however, in the case considered in the current paper, MLP neural network with online training method showed a little higher prediction power.Also, according to the results, it can be concluded that, in the considered case, the optimal number of units in the first hidden layer is 8 units and in the second layer is 20 units.About the training period, results showed that all models have their best performance with 8 epochs.Variables importance output of MLP neural network clarified that the time variable has highest impact on the magnitude of earthquakes.Overall, the best prediction was the prediction made by the models M16208 and M16204 using online training method.
Since the proposed method is a comprehensive method and needs no priori assumption, similar modelling approach can be applied for other case studies in earthquakes magnitude prediction and it is anticipated that the results for the other cases will be as good as the results of the current study.

Figure 1 .
Figure 1.Simple schematic of a multilayer perceptron neural network with 2 hidden layers

Table 8
shows the predictions made by model M16208-B for different level of magnitude in training, testing and predicting process, separately.The chosen model with 76.3 percent correct prediction have the highest percent of correct prediction between all the models with batch training method.