c++时间序列⼯具包_我的时间序列⼯具包
c++ 时间序列⼯具包
When it comes to time series forecasting, I’m a great believer that the simpler the model, the better.
关于时间序列预测,我坚信模型越简单越好。
However, not all time series are created equal. Some time series have a strongly defined trend — we often see this with economic data, for instance:
但是,并⾮所有时间序列都是相同的。 某些时间序列具有明确定义的趋势-例如,我们经常在经济数据中看到这⼀趋势:
Others show a more stationary-like pattern — e.g. monthly air passenger numbers:
其他⼈则表现出更平稳的模式,例如每⽉的航空旅客⼈数:
Image for post
Source: San Francisco Open Data
资料来源:旧⾦⼭开放数据
The choice of time series model will depend highly on the type of time series one is working with. Here are some of the most useful time series models I’ve encountered.
时间序列模型的选择将在很⼤程度上取决于正在使⽤的时间序列的类型。 这是我遇到的⼀些最有⽤的时间序列模型。
1. ARIMA (1. ARIMA)
In my experience, ARIMA tends to be most useful when modelling time series with a strong trend. The model is also adept at modelling seasonality patterns.
以我的经验,当对具有强烈趋势的时间序列进⾏建模时,ARIMA往往最有⽤。 该模型还擅长对季节性模式进⾏建模。
Let’s take an example.
让我们举个例⼦。
Suppose we wish to model over a period of years. The original data is sourced from .
假设我们希望对⼏年内的进⾏建模。 原始数据来⾃ 。
Such a time series will have a seasonal component (holiday seasons tend to have higher passenger numbers, for instance) as well as evidence of a trend as indicated when the series is decomposed as below.
这样的时间序列将具有季节性成分(例如,假⽇季节往往会有更⾼的乘客⼈数),以及当序列分解如下时所指⽰的趋势的证据。
Image for post
Source: RStudio
资料来源:RStudio
The purpose of using an ARIMA model is to capture the trend as well as account for the seasonality inherent in the time series.
使⽤ARIMA模型的⽬的是捕获趋势并考虑时间序列固有的季节性。
To do this, one can use the auto.arima function in R, which can select the best fit p, d, q coordinates for the model as well as the appropriate seasonal component.
为此,可以使⽤R中的auto.arima函数,该函数可以为模型选择最佳拟合的p,d,q坐标以及适当的季节分量。
For the above example, the model that performed best in terms of the lowest BIC was as follows:
对于上⾯的⽰例,就最低BIC⽽⾔表现最佳的模型如下:
Series: passengernumbers
ARIMA(1,0,0)(0,1,1)[12]Coefficients:
ar1    sma1
0.7794  -0.5001
<  0.0609  0.0840sigma^2 estimated as 585834:  log likelihood=-831.27
AIC=1668.54  AICc=1668.78  BIC=1676.44
Here is a visual of the forecasts.
这是预测的视觉效果。
Image for post
Source: RStudio
资料来源:RStudio
We can see that ARIMA is adequately forecasting the seasonal pattern in the series. In terms of the model performance, the RMSE (root mean squared error) and MFE (mean forecast error) were as follows:
我们可以看到ARIMA可以充分预测该系列的季节性模式。 在模型性能⽅⾯,RMSE(均⽅根误差)和MFE(平均预测误差)如下:
RMSE: 698
RMSE: 698
MFE: -115
MFE: -115
Given a mean of 8,799 passengers per month across the validation set, the errors recorded were quite small in comparison to the average — indicating that the model is performing well in forecasting air passenger numbers.
假设整个验证集中平均每⽉有8799名乘客,则记录的误差与平均值相⽐很⼩,这表明该模型在预测航空乘客⼈数⽅⾯表现良好。
2.先知 (2. Prophet)
Let’s take a look at the air passenger example once again, but this time using . Prophet is a time series tool that allows for forecasting bsaed on an additive model, and works especially well with data that has strong seasonal trends.
让我们再来看⼀次航空乘客⽰例,但这⼀次使⽤ 。 Prophet是⼀个时间序列⼯具,可⽤于根据加性模型进⾏预测,尤其适⽤于季节性趋势强烈的数据。
The air passenger dataset appears to fit the bill, so let’s see how the model would perform compared to ARIMA.
航空乘客数据集似乎符合要求,因此让我们看看与ARIMA相⽐该模型的性能如何。
In this example, Prophet can be used to identify the long-term trend for air passenger numbers, as well as seasonal fluctuations throughout the year:tool工具箱
在此⽰例中,可以使⽤先知来确定航空客运量的长期趋势以及全年的季节性波动:
Image for post
Source: Jupyter Notebook Output
资料来源:Jupyter Notebook输出
prophet_basic = Prophet()
prophet_basic.fit(train_dataset)
A standard Prophet model can be fit to pick up the trend and seasonal components automatically, alt
hough these can also be configured manually by the user.
尽管可以由⽤户⼿动配置,但标准的Prophet模型可以适合⾃动获取趋势和季节成分。
One particularly useful component of Prophet is the inclusion of changepoints, or significant structural breaks in a time series.
先知的⼀个特别有⽤的组成部分是包含变更点 ,即时间序列中的重⼤结构中断。
Image for post
Source: Jupyter Notebook Output
资料来源:Jupyter Notebook输出
Through trial and error, 4 changepoints were shown to minimise the MFE and RMSE:
通过反复试验,显⽰了4个更改点以最⼤程度地减少MFE和RMSE:
pro_change= Prophet(n_changepoints=4)
forecast = pro_change.fit(train_dataset).predict(future)
fig= pro_change.plot(forecast);
a = add_changepoints_to_a(), pro_change, forecast)
The RMSE and MAE can now be calculated as follows:
现在可以按以下⽅式计算RMSE和MAE:
>>> ics import mean_squared_error
>>> from math import sqrt
>>> mse = mean_squared_error(passenger_test, yhat14)
>>> rmse = sqrt(mse)
>>> print('RMSE: %f' % rmse)RMSE: 524.263928>>> forecast_error = (passenger_test-yhat14)
>>> forecast_error
>>> mean_forecast_error = np.mean(forecast_error)
>>> mean_forecast_error71.58326743881493
The RMSE and MFE for Prophet are both lower than that obtained using ARIMA, suggesting that the model has performed better in forecasting monthly air passenger numbers.
先知的RMSE和MFE均低于使⽤ARIMA获得的值,这表明该模型在预测每⽉航空乘客⼈数⽅⾯表现更好。
3. TensorFlow概率 (3. TensorFlow Probability)
In the aftermath of COVID-19, many time series forecasts have proven to be erroneous as they have been made with the wrong set of assumptions.
在COVID-19之后,许多时间序列的预测被证明是错误的,因为它们是⽤错误的假设集做出的。
Increasingly, it is coming to be recognised that time series models which can produce a range of forecasts can be more practically applied, as they allow for a “scenario analysis” of what might happen in the future.
⼈们越来越认识到,可以产⽣⼀系列预测的时间序列模型可以更实际地应⽤,因为它们可以对未来可能发⽣的情况进⾏“情景分析”。
As an example, an ARIMA model built using the air passenger data as above could not have possibly forecasted the sharp drop in passenger numbers that came about as a result of COVID-19.
例如,使⽤上述航空旅客数据构建的ARIMA模型可能⽆法预测由于COVID-19⽽导致的旅客⼈数急剧下降。
However, using more recent air passenger data, let’s see how a model built using would have performed:
但是,使⽤最近的航空乘客数据,让我们看看使⽤构建的模型将如何执⾏:
Image for post
Source: TensorFlow Probability
资料来源:TensorFlow概率
While the model would not have forecasted the sharp drop that ultimately came to pass, we do see that the model is forecasting a drop in passenger numbers to below 150,000. Use of this model can allow for more of a “what-if” series of forecasts — e.g. an airline could forecast monthly passenger n
umbers for a particular airport and note that passenger numbers could be significantly lower than usual — which could inform the company in terms of managing resources such as fleet utilisation, for instance.
尽管该模型⽆法预测最终会发⽣的急剧下降,但我们确实看到该模型预测的乘客⼈数将下降到150,000以下。 使⽤此模型可以进⾏更多
的“假设分析”系列预测-例如,航空公司可以预测特定机场的每⽉乘客⼈数,并请注意,乘客⼈数可能⼤⼤低于平时-这可以向公司传达例如,管理资源,例如车队利⽤。
Specifically, TensorFlow Probability makes forecasts using the assumption of a posterior distribution — which is comprised of a prior distribution (prior data) and the likelihood function.
具体来说,TensorFlow概率使⽤后验分布的假设进⾏预测,该后验分布由先验分布(先验数据)和似然函数组成。
Image for post
Source: Image Created by Author
资料来源:作者创作的图⽚
For reference, the example illustrated here uses the template from the in TensorFlow Probability tutorial, of which the original authors (Copyright 2019 The TensorFlow Authors) have made available under the Apache 2.0 license.
作为参考,此处显⽰的⽰例使⽤TensorFlow概率教程中的中的模板,该原始模板的作者(Copyright 2019 The TensorFlow Authors)已获得Apache 2.0许可。
结论 (Conclusion)
Time series analysis is about making reliable forecasts using models suited to the data in question. For data with defined trend and seasonal components, it has been my experience that these models work quite well.
时间序列分析是关于使⽤适⽤于相关数据的模型进⾏可靠的预测。 对于具有定义的趋势和季节性成分的数据,根据我的经验,这些模型⾮常有效。
Hope you found the above article of use, and feel free to leave any questions or feedback in the comments section.
希望您到了上⾯的使⽤⽂章,并随时在评论部分中留下任何问题或反馈。
Disclaimer: This article is written on an “as is” basis and without warranty. It was written with the intention of providing an overview of data science concepts, and should not be interpreted as professional advice in any way.
免责声明:本⽂按“原样”撰写,不作任何担保。它旨在提供数据科学概念的概述,并且不应以任何⽅式解释为专业建议。
c++ 时间序列⼯具包

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。