数学建模时间序列分析_时间序列分析建模验证--688IT编程网

数学建模时间序列分析_时间序列分析建模验证

数学建模时间序列分析

时间序列预测 (Time Series Forecasting)

背景 (Background)

This article is the fourth in the series on the time-series data. We started by discussing various along with data preparation techniques followed by building a robust framework. And finally, in our previous article, we discussed a wide range of forecasting techniques that must be explored before moving to machine learning algorithms.

本⽂是有关时间序列数据的系列⽂章中的第四篇。我们⾸先讨论各种以及数据准备技术，然后建⽴⼀个强⼤的框架。最后，在我们的前⼀篇⽂章中，我们讨论了⼴泛的预测技术，在转向机器学习算法之前必须对其进⾏探索。

Now, in the current article, we are going to apply all these learnings to a real-life dataset. We will work through a time series forecasting project from end-to-end, from importing the dataset, analyzing and transforming the time series to training the model, and making predictions on new data. The steps of th

is project that we will work through are as follows:

现在，在当前⽂章中，我们将所有这些学习应⽤于实际数据集。我们将从头到尾完成⼀个时间序列预测项⽬，从导⼊数据集，分析和转换时间序列到训练模型，以及对新数据进⾏预测。我们将完成的该项⽬的步骤如下：

1. Problem Description

问题描述

2. Data Preparation and Analysis

数据准备与分析

3. Set up an Evaluation Framework

建⽴评估框架

4. Stationary Check: Augmented Dickey-Fuller test

固定检查：增强的Dickey-Fuller测试

5. ARIMA Models

ARIMA模型

6. Residual Analysis

残差分析

7. Bias corrected Model

偏差校正模型

8. Model Validation

模型验证

问题描述 (Problem Description)

The problem is to predict the number of monthly airline passengers. We will use the Airline Passengers dataset for this exercise. This dataset describes the total number of airline passengers over time. The units are a count of the number of airline passengers in thousands. There are 144 mo

nthly observations from 1949 to 1960. Below is a sample of the first few rows of the dataset.

问题是要预测每⽉的航空公司乘客数量。我们将使⽤航空公司乘客数据集进⾏此练习。该数据集描述了⼀段时间内航空公司乘客的总数。单位是数千名航空公司乘客的总数。从1949年到1960年，每⽉进⾏144次观测。下⾯是数据集前⼏⾏的样本。

Sample dataset

样本数据集

You can download this dataset from .

您可以从下载该数据集。

此项⽬的Python库 (Python Libraries for this Project)

We need the following libraries to work on this project. These names are self-explanatory but don’t worry if you are not getting any of them. As we go along you will understand the usage of these libraries.

我们需要以下库来进⾏此项⽬。这些名称是不⾔⾃明的，但是不⽤担⼼这些名称。随着我们的前进，您将了解这些库的⽤法。

import numpy

from pandas import read_csv

ics import mean_squared_error

from math import sqrt

from math import log

from math import exp

from scipy.stats import boxcox

from pandas import DataFrame

from pandas import Grouper

from pandas import Series

from pandas import concat

from pandas.plotting import lag_plot

from matplotlib import pyplot

from statsmodels.tsa.stattools import adfuller

from statsmodels.tsa.arima_model import ARIMA

from statsmodels.tsa.arima_model import ARIMAResults

from statsmodels.tsa.seasonal import seasonal_decompose

aphics.tsaplots import plot_acf

aphics.tsaplots import plot_pacf

fplots import qqplot

数据准备与分析 (Data Preparation and Analysis)

We will use the read_csv() function to load the time series data as a series object, a one-dimensional array with a time label for each row. It is always good to take a peek at the data to confirm that data has been loaded correctly.

我们将使⽤read_csv()函数将时间序列数据加载为序列对象，即⼀维数组，每⾏带有时间标签。偷看数据以确认已正确加载数据始终是⼀件好事。

series = read_csv('airline-passengers.csv', header=0, index_col=0, parse_dates=True, squeeze=True)

print(series.head())

Image for post

让我们通过查看汇总统计数据开始数据分析，我们将快速了解数据分布。 (Let’s begin the data analysis by looking into the summary statistics, we will get a quick idea of the data distribution.)

print(series.describe())

We can see the number of observations matches our expectations, the mean is about 280 which we can consider our level in this series. Other statistics like standard deviation and percentiles suggest a large spread of the data.

我们可以看到观察次数与我们的期望相符，平均数约为280，我们可以将其视为本系列的⽔平。其他统计数据(例如标准差和百分位数)表明数据分布⼴泛。

下⼀步，我们将可视化折线图上的值，该⼯具可以为问题提供很多见解。 (As a next step, we will visualize the values on a line plot, this tool can provide a lot of insights into the problem.)

series.plot()

pyplot.show()

Here, the line plot suggests that there is an increasing trend of airline passengers over time. We can also observe a systematic seasonality to the travel pattern for each year and the seasonal signal appears to be growing over time, which suggests a multiplicative relationship.

在此，线图表明，随着时间的推移，航空公司的乘客数量呈增长趋势。我们还可以观察到每年出⾏⽅式的系统季节性，并且季节性信号似乎随着时间的推移⽽增长，这表明存在乘法关系。

This insight gives us a hint that data may not be stationary and we can explore differencing with one or two levels to make it stationary before modeling.

这种见解给我们⼀个暗⽰，即数据可能不是固定的，我们可以在建模之前探索⼀个或两个级别的差异以使其稳定。

我们可以通过年度线图来确认我们的假设。 (We can confirm our assumption by yearly line plots.)

For the following plot, created year-wise separate groups of data and plotted a line plot for each year from 1949 to 1957. You can create this plot for any number of years.

对于下⾯的图，创建了逐年的数据组，并绘制了从1949年到1957年的每⼀年的线图。您可以创建任意年的图。

pyplot.figure()

i = 1

n_groups = len(groups)

validation框架for name, group in groups:

pyplot.subplot((n_groups*100) + 10 + i)

i += 1

pyplot.plot(group)

pyplot.show()

We can observe that seasonality is a yearly cycle by looking at line plots of the dataset by year. We can see a dip at each year-end and rise from July to August. This pattern exists across the years which again suggests us to adopt season based modeling.

通过按年份查看数据集的线图，我们可以观察到季节性是⼀个年度周期。我们可以看到每年年底都有下降，从7⽉到8⽉上升。多年来⼀直存在这种模式，这再次表明我们采⽤基于季节的建模。

让我们探索观察的密度，以进⼀步了解我们的数据结构。 (Let’s explore the density of observations for further insight into our data structure.)

pyplot.figure(1)

pyplot.subplot(211)

series.hist()

pyplot.subplot(212)

series.plot(kind='kde')

pyplot.show()

Image for post

We can observe that the distribution is not Gaussian, and this insight encourages us to explore some log or power transforms of the data before modeling.

我们可以观察到分布不是⾼斯分布，这种见解⿎励我们在建模之前探索数据的⼀些对数或幂变换。

让我们按年份分析每⽉数据，并了解每年观测值的分布范围。 (Let’s analyze monthly data by year and get an idea of the spread of observations for each year.)

We will perform this analysis through a box and whisker plot.

我们将通过箱形图和晶须图进⾏此分析。

for name, group in groups:

ar] = group.values

years.boxplot()

pyplot.show()

The spread of the data (blue boxes) suggests a growth trend over the years which also suggests our assumption of non-stationarity of the data.

数据的散布(蓝⾊框)表明多年来的增长趋势，这也表明我们假设数据是⾮平稳的。

分解时间序列可以更清楚地了解其组成部分-⽔平，趋势，季节性和噪声。 (Decompose the time series for more clarity on its components — Level, Trend, Seasonality, and Noise.)

Based on our analysis till now, we have an intuition that out time series is multiplicative. So, we can decompose the series assuming a multiplicative model.

根据到⽬前为⽌的分析，我们可以直观地看出时间序列是可乘的。因此，我们可以假设乘法模型来分解序列。

result = seasonal_decompose(series, model='multiplicative')

result.plot()

pyplot.show()

We can see that the trend and seasonality information extracted from the series validate our earlier findings that series has a growing trend and yearly seasonality. The residuals are also interesting, showing periods of high variability in the early and later years of the series.

688IT编程网

数学建模时间序列分析_时间序列分析建模验证

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

excel文字递增函数公式

数字递增公式

notepad 正则变量运算

C++regex库常用函数及实例

js正则表达式之前瞻后顾与非捕获分组

indesign正则数字和英文之间的空格

C#匹配中文字符串的4种正则表达式分享

PHP正则表达式匹配中文字符

匹配中文汉字的正则表达式介绍

Python正则表达式如何进行字符串替换

orcl中用正则表达式

sql正则表达式excel

dataframe正则表达式

postgress sql正则

el-upload accept 正则表达式

半小时正则表达式

判断科学计数法的正则

根据url判断静态资源的方法

Java正则表达式-匹配正负浮点数

替换模糊匹配正则-hive

最新文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

能被5整除的十进制整数的正规表达式

大于0小于等于1的正则表达式

linux grep 26个字母

java pattern 正则表达式

掌握文本编辑器中的搜索和替换技巧

标签列表

688IT编程网

数学建模时间序列分析_时间序列分析建模验证

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

excel文字递增函数公式

数字递增公式

notepad 正则变量运算

C++regex库常用函数及实例

js正则表达式之前瞻后顾与非捕获分组

indesign正则数字和英文之间的空格

C#匹配中文字符串的4种正则表达式分享

PHP正则表达式匹配中文字符

匹配中文汉字的正则表达式介绍

Python正则表达式如何进行字符串替换

orcl中用正则表达式

sql正则表达式excel

dataframe正则表达式

postgress sql正则

el-upload accept 正则表达式

半小时 正则表达式

判断科学计数法的正则

根据url判断静态资源的方法

Java正则表达式-匹配正负浮点数

替换模糊匹配正则-hive

最新文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

能被5整除的十进制整数的正规表达式

大于0小于等于1的正则表达式

linux grep 26个字母

java pattern 正则表达式

掌握文本编辑器中的搜索和替换技巧

标签列表

java正则表达式选择题

非零金额正则表达式

半小时正则表达式