一元时间序列

时间序列:设有随机变量序列 { Xt,t=,2,1,0,1,2,X_t, t=\dots, -2, -1, 0, 1, 2, \dots }, 称其为一个时间序列。

自协方差函数: 时间序列{$ X_t$ }中两个随机变量的协方差 Cov(Xs,Xt)\text{Cov}(X_s, X_t)叫做自协方差。 如果Cov(Xs,Xt)=γts\text{Cov}(X_s, X_t) = \gamma_{|t-s|}仅依赖于(t-s), 则称 γk=Cov(Xtk,Xt),k=0,1,2,\gamma_k = \text{Cov}(X_{t-k}, X_t), k=0,1,2,\dots 为时间序列{XtX_t }的自协方差函数。 因为Cov(Xs,Xt)=Cov(Xt,Xs)\text{Cov}(X_s, X_t) = \text{Cov}(X_t, X_s), 所以γk=γk\gamma_{-k} = \gamma_k。 易见γ0=Var(Xt)\gamma_0 = \text{Var}(X_t)

由Cauchy-Schwartz不等式, $|\gamma_k | = \left| E[ (X_{t-k} - \mu) (X_t - \mu)] \right| \leq \left( E(X_{t-k} - \mu)^2 ; E(X_t - \mu)^2 \right)^{1/2} = \gamma_0 $

弱平稳序列(宽平稳序列,weakly stationary time series): 如果时间序列({ X_t })存在有限的二阶矩且满足:

(1) EXt=μEX_t = \mu与t无关;

(2) Var(Xt)=γ0\text{Var}(X_t) = \gamma_0与t无关;

(3) γk=Cov(Xtk,Xt),k=1,2,\gamma_k = \text{Cov}(X_{t-k}, X_t),k=1,2,\dots与t无关

自相关函数(ACF)ρk=ρ(Xtk,Xt)=Cov(Xtk,Xt)Var(Xtk)Var(Xt)=γkγ0γ0=γkγ0, k=0,1,, t\rho_k = \rho(X_{t-k}, X_t) = \frac{\text{Cov}(X_{t-k}, X_t)}{\sqrt{\text{Var}(X_{t-k})\text{Var}(X_{t})}} = \frac{\gamma_k}{\sqrt{\gamma_0 \gamma_0}} = \frac{\gamma_k}{\gamma_0}, \ k=0,1,\dots, \ \forall t 记(ρk=γk/γ0\rho_k = \gamma_k / \gamma_0), 为时间序列({XtX_t })的自相关函数 (rho0=1rho_0=1)。

AR(1)模型:如果(ρ10\rho_1 \neq 0), 则(XtX_t)与(Xt1X_{t-1})相关, 可以用(Xt1X_{t-1})预测(XtX_t)。 最简单的预测为线性组合, 如下模型:Xt=ϕ0+ϕ1Xt1+εtX_t = \phi_0 + \phi_1 X_{t-1} + \varepsilon_t 称为一阶自回归模型(Autoregression model),记作AR(1)模型。 其中({ εt\varepsilon_t })是零均值独立同分布白噪声序列, 方差为(σ2\sigma^2), 并设(εt\varepsilon_t)与(Xt1,Xt2,X_{t-1}, X_{t-2}, \dots)独立。 系数(ϕ1<1|\phi_1|<1)。

AR(1)模型也是马尔可夫(Markov)过程: (XtX_t)在(Xt1,Xt2,X_{t-1}, X_{t-2}, \dots)条件下的条件分布, 只与(Xt1X_{t-1})有关。 已知(Xt1X_{t-1})后, 用(Xt1,Xt2,X_{t-1}, X_{t-2}, \dots)去预测(XtX_t), 与仅用(Xt1X_{t-1})去预测的效果相同。 这种性质称为马氏性

MA(1)模型:移动平均模型是具有q步外不相关性质的平稳列的模型;高阶的AR模型, 有些可以用低阶的MA模型更好地描述: Xt=ϕ0j=1(θ1)jXtj+εtX_t = \phi_0 - \sum_{j=1}^\infty (-\theta_1)^j X_{t-j} + \varepsilon_t 其中0<θ<10 < |\theta| < 1。 模型可改写为Xt=ϕ0(1+θ1)+εt+θ1εt1X_t = \phi_0(1+\theta_1) + \varepsilon_t + \theta_1 \varepsilon_{t-1} 这样的模型称为MA(1)模型。

多元时间序列

多元时间序列:一元拓展到多元,考虑一个k元时间序列 rt=(r1t,,rkt)T\boldsymbol r_t = (r_{1t}, \dots, r_{kt})^T

互协方差阵:半正定k阶方阵, 记Γ0=(Γij(0))k×k\Gamma_0 = (\Gamma_{ij}(0))_{k\times k}, 则Γii(0)\Gamma_{ii}(0)是分量ritr_{it}的方差, Γij(0)\Gamma_{ij}(0)是分量ritr_{it}与分量rjtr_{jt}的协方差。

为了将协方差阵变成相关阵, 记 D=diag(Γ11(0),,Γkk(0))\begin{aligned} D = \text{diag}(\sqrt{\Gamma_{11}(0)}, \dots, \sqrt{\Gamma_{kk}(0)}) \end{aligned}

互相关阵 (CCM): 令 $\begin{aligned} \boldsymbol\rho_0 = (\rho_{ij}(0))_{k\times k} = D^{-1} \Gamma_0 D^{-1} \end{aligned} \boldsymbol\rho_0是随机向量是随机向量\boldsymbol r_t$的相关阵, 称为多元时间序列{ rt\boldsymbol r_t }的同步的或者滞后为0的互相关阵。 其元素 ρij(0)=Γij(0)Γii(0)Γjj(0)=Cov(rit,rjt)Var(rit)Var(rjt)\begin{aligned} \rho_{ij}(0) = \frac{\Gamma_{ij}(0)}{\sqrt{\Gamma_{ii}(0) \Gamma_{jj}(0)}} \end{aligned} \boldsymbol = \frac{\text{Cov}(r_{it}, r_{jt})}{\sqrt{\text{Var}(r_{it}) \text{Var}(r_{jt})}}是一个对角线元素全为1的对称半正定阵

滞后l的互协方差阵定义为 Γl=(Γij(l))k×k=E[(rtμ)(rtlμ)T]\begin{aligned} \Gamma_l = (\Gamma_{ij}(l))_{k \times k} = E[ (\boldsymbol r_t - \boldsymbol\mu) (\boldsymbol r_{t-l} - \boldsymbol\mu)^T ] \end{aligned}

滞后l的互相关阵:定义为 ρl=(ρij(l))k×k=D1ΓlD1\begin{aligned} \boldsymbol\rho_l = (\rho_{ij}(l))_{k\times k} = D^{-1} \Gamma_l D^{-1} \end{aligned},其中 ρij(l)=Γij(l)Γii(0)Γjj(0)=Cov(rit,rj,tl)Var(rit)Var(rjt)\begin{aligned} \rho_{ij}(l)= \frac{\Gamma_{ij}(l)}{\sqrt{\Gamma_{ii}(0) \Gamma_{jj}(0)}} \end{aligned} = \frac{\text{Cov}(r_{it}, r_{j,t-l})}{\sqrt{\text{Var}(r_{it}) \text{Var}(r_{jt})}}ritr_{it}rj,tlr_{j,t-l}的相关系数。 Var(rj,tl)=Γjj(0)\text{Var}(r_{j,t-l}) = \Gamma_{jj}(0)

ρij(l)0\rho_{ij}(l) \neq 0,则说明rj,tlr_{j,t-l}ritr_{it}有先导作用,又称ritr_{it}rj,tlr_{j,t-l}有线性依赖

问题抽象与思路

问题抽象

给定信息集Ft={(yi,zi)T,i=1..t;xi,i=1..t+3}F_t = \{(y_i,z_i)^T,i=1..t;\quad x_i,i=1..t+3\},其中:

  • yi=(yi(1),yi(2),yi(3),yi(4),yi(5),yi(6))Ty_i = (y_i^{(1)},y_i^{(2)},y_i^{(3)},y_i^{(4)},y_i^{(5)},y_i^{(6)})^T表示实测污染物数据,共6维(分别是SO2、NO2、PM10、PM2.5、𝑂3、CO)
  • zi=(zi(1),zi(2),zi(3),zi(4),zi(5))Tz_i = (z_i^{(1)},z_i^{(2)},z_i^{(3)},z_i^{(4)},z_i^{(5)})^T表示实测污染物数据,共5维(分别是温度、湿度、气压、风向、风速)
  • xi=(xi(1),xi(2),xi(3),xi(4),xi(5),xi(6))Tx_i = (x_i^{(1)},x_i^{(2)},x_i^{(3)},x_i^{(4)},x_i^{(5)},x_i^{(6)})^T表示一次预报污染物数据,共6维。

目标:预测 y^t+1,y^t+2,y^t+3\hat y_{t+1},\hat y_{t+2},\hat y_{t+3}

思路

  1. 建立多元时间序列VAR模型:通过(yi,zi)T,i=1..t(y_i,z_i)^T,i=1..t 预测(y^t+1(1),z^t+1(1))T,(y^t+2(1),z^t+2(1))T,(y^t+3(1),z^t+3(1))T(\hat y_{t+1}^{(1)},\hat z_{t+1}^{(1)})^T,(\hat y_{t+2}^{(1)},\hat z_{t+2}^{(1)})^T,(\hat y_{t+3}^{(1)},\hat z_{t+3}^{(1)})^T

  2. 建立多元LSTM模型:通过(yi,zi)T,i=1..t(y_i,z_i)^T,i=1..t 预测(y^t+1(2),z^t+1(2))T,(y^t+2(2),z^t+2(2))T,(y^t+3(2),z^t+3(2))T(\hat y_{t+1}^{(2)},\hat z_{t+1}^{(2)})^T,(\hat y_{t+2}^{(2)},\hat z_{t+2}^{(2)})^T,(\hat y_{t+3}^{(2)},\hat z_{t+3}^{(2)})^T

  3. 建立集成学习Stacking模型进行综合预测:通过(x_{(t+k)},y ̂_{(t+k)}^{(1)}, y ̂_{(t+k)}^{(2)} )^T,k=1,2,3预测 y^t+k\hat y_{t+k}

为避免虚假回归,分别建立6个污染物3期多元线性回归:y^t+k,j=αk,j(xt+k,j,y^t+k,j(1),y^t+k,j(2),1)T+ek,j\hat y_{t+k,j} = \alpha_{k,j} (x_{t+k,j},\hat y_{t+k,j}^{(1)},\hat y_{t+k,j}^{(2)},1)^T + e_{k,j}

其中,k=1,2,3代表预测未来第k天的污染物,j=1…6,分别代表6个污染物,αk,j\alpha_{k,j}是预测第k天,第j个污染物的参数,为4维向量,最后一维为偏差,ek,je_{k,j}是预测第k天,第j个污染物的残差

简化版

多元时间序列rt=(r1t,,rkt)T\boldsymbol r_t = (r_{1t}, \dots, r_{kt})^T

互协方差阵Γ0=(Γij(0))k×k\Gamma_0 = (\Gamma_{ij}(0))_{k\times k}Γij(0)\Gamma_{ij}(0)ritr_{it}rjtr_{jt}的协方差

D=diag(Γ11(0),,Γkk(0))\begin{aligned} D = \text{diag}(\sqrt{\Gamma_{11}(0)}, \dots, \sqrt{\Gamma_{kk}(0)}) \end{aligned}

互相关阵 (CCM): $\begin{aligned} \boldsymbol\rho_0 = (\rho_{ij}(0))_{k\times k} = D^{-1} \Gamma_0 D^{-1} \end{aligned} $

ρij(0)=Γij(0)Γii(0)Γjj(0)=Cov(rit,rjt)Var(rit)Var(rjt)\begin{aligned} \rho_{ij}(0) = \frac{\Gamma_{ij}(0)}{\sqrt{\Gamma_{ii}(0) \Gamma_{jj}(0)}} \end{aligned} \boldsymbol = \frac{\text{Cov}(r_{it}, r_{jt})}{\sqrt{\text{Var}(r_{it}) \text{Var}(r_{jt})}}ritr_{it}rjtr_{jt}的相关系数

滞后l的互协方差阵Γl=(Γij(l))k×k=E[(rtμ)(rtlμ)T]\begin{aligned} \Gamma_l = (\Gamma_{ij}(l))_{k \times k} = E[ (\boldsymbol r_t - \boldsymbol\mu) (\boldsymbol r_{t-l} - \boldsymbol\mu)^T ] \end{aligned}

滞后l的互相关阵ρl=(ρij(l))k×k=D1ΓlD1\begin{aligned} \boldsymbol\rho_l = (\rho_{ij}(l))_{k\times k} = D^{-1} \Gamma_l D^{-1} \end{aligned}

ρij(l)=Γij(l)Γii(0)Γjj(0)=Cov(rit,rj,tl)Var(rit)Var(rjt)\begin{aligned} \rho_{ij}(l)= \frac{\Gamma_{ij}(l)}{\sqrt{\Gamma_{ii}(0) \Gamma_{jj}(0)}} \end{aligned} = \frac{\text{Cov}(r_{it}, r_{j,t-l})}{\sqrt{\text{Var}(r_{it}) \text{Var}(r_{jt})}}ritr_{it}rj,tlr_{j,t-l}的相关系数

线性依赖:若ρij(l)0\rho_{ij}(l) \neq 0,则说明rj,tlr_{j,t-l}ritr_{it}有先导作用,又称ritr_{it}rj,tlr_{j,t-l}有线性依赖

问题抽象

给定信息集Ft={(yi,zi)T,i=1..t;xi,i=1..t+3}F_t = \{(y_i,z_i)^T,i=1..t;\quad x_i,i=1..t+3\},其中:

  • yi=(yi(1),yi(2),yi(3),yi(4),yi(5),yi(6))Ty_i = (y_i^{(1)},y_i^{(2)},y_i^{(3)},y_i^{(4)},y_i^{(5)},y_i^{(6)})^T表示实测污染物数据,共6维(分别是SO2、NO2、PM10、PM2.5、𝑂3、CO)
  • zi=(zi(1),zi(2),zi(3),zi(4),zi(5))Tz_i = (z_i^{(1)},z_i^{(2)},z_i^{(3)},z_i^{(4)},z_i^{(5)})^T表示实测气象数据,共5维(分别是温度、湿度、气压、风向、风速)
  • xi=(xi(1),xi(2),xi(3),xi(4),xi(5),xi(6))Tx_i = (x_i^{(1)},x_i^{(2)},x_i^{(3)},x_i^{(4)},x_i^{(5)},x_i^{(6)})^T表示一次预报污染物数据,共6维。

目标:预测 y^t+1,y^t+2,y^t+3\hat y_{t+1},\hat y_{t+2},\hat y_{t+3}

思路

  1. 建立多元时间序列VAR模型:通过(yi,zi)T,i=1..t(y_i,z_i)^T,i=1..t 预测(y^t+1(1),z^t+1(1))T,(y^t+2(1),z^t+2(1))T,(y^t+3(1),z^t+3(1))T(\hat y_{t+1}^{(1)},\hat z_{t+1}^{(1)})^T,(\hat y_{t+2}^{(1)},\hat z_{t+2}^{(1)})^T,(\hat y_{t+3}^{(1)},\hat z_{t+3}^{(1)})^T

  2. 建立多元LSTM模型:通过(yi,zi)T,i=1..t(y_i,z_i)^T,i=1..t 预测(y^t+1(2),z^t+1(2))T,(y^t+2(2),z^t+2(2))T,(y^t+3(2),z^t+3(2))T(\hat y_{t+1}^{(2)},\hat z_{t+1}^{(2)})^T,(\hat y_{t+2}^{(2)},\hat z_{t+2}^{(2)})^T,(\hat y_{t+3}^{(2)},\hat z_{t+3}^{(2)})^T

  3. 建立集成学习Stacking模型进行综合预测:通过(x_{(t+k)},y ̂_{(t+k)}^{(1)}, y ̂_{(t+k)}^{(2)} )^T,k=1,2,3预测 y^t+k\hat y_{t+k}

为避免虚假回归,分别建立6个污染物3期多元线性回归:y^t+k,j=αk,j(xt+k,j,y^t+k,j(1),y^t+k,j(2),1)T+ek,j\hat y_{t+k,j} = \alpha_{k,j} (x_{t+k,j},\hat y_{t+k,j}^{(1)},\hat y_{t+k,j}^{(2)},1)^T + e_{k,j}

其中,k=1,2,3代表预测未来第k天的污染物,j=1…6,分别代表6个污染物,αk,j\alpha_{k,j}是预测第k天,第j个污染物的参数,为4维向量,最后一维为偏差,ek,je_{k,j}是预测第k天,第j个污染物的残差

时间序列Xt,t=,2,1,0,1,2,X_t, t=\dots, -2, -1, 0, 1, 2, \dots

自协方差函数γk=Cov(Xtk,Xt),k=0,1,2,\gamma_k = \text{Cov}(X_{t-k}, X_t), k=0,1,2,\dots

Cauchy-Schwartz不等式: $|\gamma_k | = \left| E[ (X_{t-k} - \mu) (X_t - \mu)] \right| \leq \left( E(X_{t-k} - \mu)^2 ; E(X_t - \mu)^2 \right)^{1/2} = \gamma_0 $

自相关函数(ACF)ρk=ρ(Xtk,Xt)=Cov(Xtk,Xt)Var(Xtk)Var(Xt)=γkγ0γ0=γkγ0, k=0,1,\rho_k = \rho(X_{t-k}, X_t) = \frac{\text{Cov}(X_{t-k}, X_t)}{\sqrt{\text{Var}(X_{t-k})\text{Var}(X_{t})}} = \frac{\gamma_k}{\sqrt{\gamma_0 \gamma_0}} = \frac{\gamma_k}{\gamma_0}, \ k=0,1,\dots

AR(1)模型:如果ρ10\rho_1 \neq 0, 则XtX_tXt1X_{t-1}相关, 可以用Xt1X_{t-1}预测XtX_tXt=ϕ0+ϕ1Xt1+εtX_t = \phi_0 + \phi_1 X_{t-1} + \varepsilon_t

MA(1)模型:高阶的AR模型, 有些可以用低阶的MA模型更好地描述,

如: Xt=ϕ0j=1(θ1)jXtj+εtX_t = \phi_0 - \sum_{j=1}^\infty (-\theta_1)^j X_{t-j} + \varepsilon_t

可改写为Xt=ϕ0(1+θ1)+εt+θ1εt1X_t = \phi_0(1+\theta_1) + \varepsilon_t + \theta_1 \varepsilon_{t-1}

超前一步预测:x^h(1)=ϕ0+j=1pϕjxh+1j+j=1qθjε^h+1j\hat{x}_{h}(1)=\phi_{0}+\sum_{j=1}^{p} \phi_{j} x_{h+1-j}+\sum_{j=1}^{q} \theta_{j} \hat{\varepsilon}_{h+1-j}

对超前 kk 步预测有:x^h(k)=ϕ0+j=1pϕjx^h+kj+j=1qθjε^h+kj\hat{x}_{h}(k)=\phi_{0}+\sum_{j=1}^{p} \phi_{j} \hat{x}_{h+k-j}+\sum_{j=1}^{q} \theta_{j} \hat{\varepsilon}_{h+k-j}

Var(eh(l))=(1+ϕ12++ϕl12)σa2Var(e_{h(l)}) = (1+\phi_1^2+\cdots+\phi_{l-1}^2)\sigma_a^2

集成算法大致可以分为:Bagging,Boosting 和 Stacking 等类型

Bagging (Boostrap Aggregating) 通过重复采样的方式训练多个子模型,将结果投票或平均。

Boosting 是一种提升算法,可以将弱的学习算法提升 (boost) 为强的学习算法,每次用新的模型学习错分类的样本/拟合上一次的残差。

Stacking 是一种集成学习方法,同时也是一种模型组合策略,相对简单的模型组合策略:平均法投票法

https://leovan.me/cn/2018/12/ensemble-learning/