Weather Impact on Passenger Flow of Rail Transit Lines

Passenger flow prediction is important for the planning, design and decision-making of urban rail transit lines. Weather is an important factor that affects the passenger flow of rail transit line by changing the travel mode choice of urban residents. A number of previous researches focused on analyzing the effects of weather (e.g. rain, snow, and temperature) on public transport ridership, but the effects on rail transit line yet remain largely unexplored This study aims to explore the influence of weather on ridership of urban rail transit lines, taking Chengdu rail transit line 1 and line 2 as examples. Linear regression method was used to develop models for estimating the daily passenger flow of different rail transit lines under different weather conditions. The results show that for Chengdu rail transit line 1, the daily ridership rate of rail transit increases with increasing temperature. While, for Chengdu rail transit line 2, the daily ridership rate of rail transit decreases with increasing wind power. The research findings can provide effective strategies to rail transit operators to deal with the fluctuation in daily passenger flow.


Introduction
The passenger flow estimation of urban rail transit is widely used as the foundation for the planning, design, and daily operations of rail transit. Weather can influence people's travel behavior and traffic safety, and then have an impact on passenger flow of rail transit line. For example, rain is considered one of the most common adverse weather that may lead to change or cancellation of trips. But, weather factors are not usually presented in the existing models for estimating rail transit line ridership, which results in an insufficient or excessive estimation in the design stage, and unexpected large fluctuations in operation stage. It is essential to identify the impacts of weather factors on passenger flow of rail transit line. The relevant research mainly includes three aspects: data preprocessing of passenger flow [1, 2] quantitative analysis of impact factors [3][4][5][6][7] and development of estimation models [8,9].
Several studies have explored the effects of rain and snow on public transit ridership. Inclement weather has an impact on people's travel modes and travel routes, and further effects on passenger flow in public transport [10,11]. Changnon [12] found that summer rain days have a reduced number of passengers using public buses compared to summer sunny days. Cravo et al. [13] found that rain and snow have negative impacts on passenger flow of bus and subway. Guo et al. [14] investigated the impact of weather elements, and revealed that rain has a negative impact on bus and rail ridership. Zhou et al. [15] found that the negative impact of rain on bus ridership appears obvious during off-peak time, and no significant effect shows during peak hours.
Arana et al. [16] showed increasing temperature leads to an increase in bus ridership on weekends. However, other studies show inconsistent results regarding the impact of temperature on passenger flow. For example, Kashfi et al. [17] exhibited no obvious relationship between temperature and bus ridership. Stover et al. [18] found that snow temperature has no obvious impact on bus passengers. Some studies focused on developing the estimation models for daily passenger flow. Cravo et al. [13] used a crosssectional regression model to determine the impact of weather on New York City Transit's daily ridership. Stover et al. [18] used the least square methods to analyze the impact of weather on bus ridership. Zhao et al. [19] identified the impact of weather factors on passenger flow rate using multiple linear regression.
Most existing research has focused on the effects of weather (e.g. rain, snow, and temperature) on public transport ridership, but the effects on rail transit remain largely unexplored. On the other hand, little evidence is available to examine the weather-transit ridership in different lines of urban rail transit. Hence, there is a need for transport scholars to begin to determine the effects of weather on rail transit ridership in multiple lines. This study will identify the impacts of weather on ridership of rail transit line, in the two rail transit lines in Chengdu. Six months of data will be used to model the relationship between weather and ridership with linear regression method.

Passenger Flow Data and Weather Data
The datasets used in this study consist of two types: passenger flow data and weather data. The passenger flow data covering a period from Jan 01, 2017 to Jun 01, 2017 was obtained from China Railway Corporation official website. It should be noted that Chengdu opened a new rail transit line on Jun 02, 2017, which led to a significant change of passenger flow. Therefore, the 2017 data after Jun 02 was excluded. The average daily passenger flow was calculated by weekday, weekend and holiday, shown in Table 1. The data were collected from two Chengdu rail transit lines, lines 1 and 2 (see Figure 1). It was observed that the daily average passenger flow is higher on weekday than on weekend, and weekend is higher than holiday in the two lines.  Note: a-Include seven national holidays in China, see Table 2 for details.
The meteorological data form Jan 01, 2017 to Jun 01, 2017 were gained from the World Weather Online [20], including daily temperatures (°C) (the lowest and highest), weather conditions (sunny, cloudy, rainy and snowy), wind speed (m/s), cloud fraction (%), precipitation (mm), air pressure (bar) and humidity (%). The flowchart of this research method is available in Figure 2.

Passenger Flow of Rail Transit Line
Before analyzing the relationship between passenger flow and weather, it is necessary to clean raw data. Daily passenger flow of rail transit line is affected by many factors such as holidays, large-scale events, and emergencies. The purpose of data cleaning is not only to detect and correct incomplete or inaccurate records, but also to reduce or eliminate the effects of other factors except weather.
First, it is important to identify the outliers of passenger flow data. Outliers are observations that lie abnormal distances from other values. Fig. 3shows the daily passenger flow of Chengdu rail transit line 1 between 01/01/2017 and 01/06/2017. It was observed that several obvious outliers are presented at the end of January and the beginning of February. This time period is exactly in the longest national holiday in China (Lunar New Year). China has seven national holidays (shown in Table 2), in which the passenger flow varies significantly. Therefore, the holiday effect needs to be removed from passenger flow data.  It was also noticed that significant changes in passenger flow usually present in several days before or after a holiday. To determine the days, the standard deviations of data sequences were compared. Standard deviation is the measure of dispersion of a set of data from its mean. The smaller the standard deviation, the lower the dispersion. Results show that the standard deviation of the dataset decreases, excluding the first day before holiday, and the first day before and after holiday, compared to the original dataset. When two or more days before and after holiday are removed, the standard deviations do not change much. For example, the standard deviation of line 1 reduces from 14.82 to 14.61 after removing the first day before holiday. And then, the standard deviation begins to slightly increase after elimination of two or more days before holiday. Therefore, to keep the original data as much as possible, only data of the first day before holiday were selected to eliminate (shown in Table 3).

Weather Data
In this study, weather data was cleaned in the following three aspects: (1) Eliminate the data whose corresponding passenger flow data were removed.
(2) Eliminate the data with missing values.
(3) Eliminate the data of extreme weather events, such as hailstorm.

Correlation Analysis between Weather and Passenger Flow of Rail Transit Lines
Because the passenger flow distribution presents a significant difference on weekend and on weekday, the flow data of weekend and weekday were separated. Scatterplots were used to analyze if there are relationships between weather factors and passenger flow of rail transit line. In Figure 4 , it was found that temperature (the average of the highest and lowest temperature) has an effect on passenger flow of rail transit line 1, as well as wind speed has an influence on passenger flow of rail transit line 2. It was also observed that the weekend flow is more likely to be affected by weather than the weekday flow.

Models of Daily Passenger Flow Estimation
The daily passenger flow presents a periodic fluctuation between weekday and weekend, and obvious changes in holidays (shown in Figure 3). Thus, as developing the daily passenger flow estimation model, the day factor (DF, weekday and weekend) and holiday factor (HF) should be included. Otherwise, Based on Kashfi's study [21], day factor (DF c,b ) and holiday factor (HF c,d ) for two rail transit lines were calculated by Equations 2 and 3 respectively, and the calculated values are shown in Table 4.
Where DR c,b,i is original daily passenger flow of rail transit line c on day i of day type b. N b is the number of relevant days of day type b. DR c,av is original average daily passenger flow of rail transit line c.
Where DR c,d,i is original daily passenger flow of rail transit line c on day i of holiday d. N d is the number of relevant days of holiday d.
Temperature factor (TF c,t ) and wind power factor (WPF c,tw ) for the two rail transit lines were calculated by Equations 4 and 5 respectively, and the calculated values are demonstrated in Table 5.
Where DR c,t,i is original daily passenger flow of rail transit line c on day i for a given temperature interval t. N t is the number of relevant days occurring in a temperature interval t.
Where DR c,wp,i = original daily passenger flow of rail transit line c on day i for a given wind power wp; N wp = the number of relevant days having a wind power wp.

Results and Discussion
SPSS software was used to perform multiple linear regression, and stepwise regression method was used to eliminate non-significant variables. Table 6 shows the parameter estimates for the regression model, including the correlation coefficient (B), standard error (S. E), t-statistics, P value and R 2 value. R 2 is a goodness-of-fit measure for the model: the higher the R 2 value, the better the estimation model fits the passenger flow data. If R 2 value is more than 0.8, this value is generally considered strong effect size. If |t| is greater than 1.96 at a 5% significance level, we are 95% confident that the variable has a significant impact on the daily passenger flow in rail transit lines, otherwise the variable will be eliminated.
It was seen that the day and holiday variables have high coefficients, which indicates that they play a dominant role in daily passenger flow (P < 0.001). The temperature has a significant impact on passenger flow for rail transit line 1 (P < 0.05), and the wind power has a significant impact on passenger flow for rail transit line 2 (P < 0.05). According to the |t| value, the variables with a |t| value less than 1.96 were eliminated (see Table 6). From the results of regression model fitting, R 2 values with more than 0.8 indicate that the model produced a high goodness of fit. In order to improve the accuracy of passenger flow estimation, the daily passenger flow estimation models for Chengdu rail transit line 1 and line 2 were established respectively, based on the results in Table 6, as shown in Equations 6 and 7.
Line 1: DR = α 1 + 11 DF c,b + 12 HF c,d + 13 TF c,t For Chengdu's rail transit line 1 and line 2, the estimates using the newly-developed models were compared with the original ridership rates. The estimated daily passenger flow, calculated using Eq. (6) for line 1 and Eq. (7) for line 2 respectively, were compared with the original daily passenger flow (shown in Figs. 5and 6). It was observed that in general, the estimates for the two lines are in close proximity to the actual values. Line 1 primarily provides commuter service to people, including businessmen, white-collar, IT elite, and business travellers. The target passenger travel occurs mainly on weekday, so there are significant differences between weekday and weekend passenger flows. Weekly patterns are repeated within each month block, causing some peaks and troughs from the actual data were not fully captured. However, Line 2 shows subtle differences between weekday and weekend passenger flows. Similar to Line 1, some peaks and troughs from the actual data were not fully captured. This study analyzed the effects of temperature and wind power on daily passenger flow in rail transit lines. It was found that temperature has a significant impact on passenger flow for Chengdu rail transit line 1, and wind power has a significant impact on passenger flow for Chengdu rail transit line 2.

Temperature
In the temperature ranging from 20 to 30•C, temperature has a significant impact on daily passenger flow of Chengdu rail transit line 1, and the correlation coefficient is positive. The results indicate that the daily passenger flow of rail transit line 1 increases with an increase of temperature.

Wind Power
Within the wind power ranging from level 1 to level 2, wind power has a significant impact on daily passenger flow of Chengdu rail transit line 2, and the correlation coefficient is negative. The results indicate that the daily passenger flow of rail transit line 2 reduces as wind power increases.

Conclusion
This study performed large-scale data analysis on the data of daily passenger flow and weather elements to explore the impacts of weather factors on usage of rail transit line. The daily ridership estimation models were established under different weather conditions for different rail transit lines. The results show that in Chengdu, the increase in temperature is associated with increasing ridership in rail transit line 1, and the increase in wind power is associated with decreasing ridership in rail transit line 2. These findings provide rail transit operators with valuable information to deal with daily passenger flow fluctuation related to varying weather conditions.

Conflicts of Interest
The authors declare no conflict of interest. Original daily passenger flow Estimated value