Logistics

Piazza

ARIMA

Glacial Varve Series

Let’s plot the Glacial Varve Series. The log transform makes the variance stable over time. The 1st order differencing makes it look like stationary.

gvss = ts(data.frame(
  x = astsa::varve[-1],
  logx = log10(astsa::varve)[-1],
  difflogx = diff(log10(astsa::varve))
))
title_string = "Glacial Varve Series"
plot(gvss, main = title_string)

Let’s fit ARIMA with \(d = 1\) with. To determine the order, let’s plot ACF and PACF. ACF has lag 1 significant, while PACF has a longer decay. Our first guess is ARIMA(0,1,1).

acf(gvss[, "difflogx"])

pacf(gvss[, "difflogx"])

However, the residual of the ARIMA(0,1,1) fit still has non-zero autocorrelation.

astsa::sarima(gvss[, "difflogx"], 0, 1, 1)
## initial  value -0.871202 
## iter   2 value -1.204551
## iter   3 value -1.244840
## iter   4 value -1.305272
## iter   5 value -1.351573
## iter   6 value -1.360171
## iter   7 value -1.371771
## iter   8 value -1.380065
## iter   9 value -1.383834
## iter  10 value -1.384684
## iter  11 value -1.384795
## iter  12 value -1.385201
## iter  13 value -1.385455
## iter  13 value -1.385455
## iter  14 value -1.385595
## iter  14 value -1.385595
## iter  15 value -1.385603
## iter  15 value -1.385603
## iter  15 value -1.385603
## final  value -1.385603 
## converged
## initial  value -1.378338 
## iter   2 value -1.379274
## iter   3 value -1.379901
## iter   4 value -1.379917
## iter   5 value -1.379920
## iter   5 value -1.379920
## iter   5 value -1.379920
## final  value -1.379920 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ma1  constant
##       -1.000     0e+00
## s.e.   0.004     1e-04
## 
## sigma^2 estimated as 0.06266:  log likelihood = -24.66,  aic = 55.32
## 
## $degrees_of_freedom
## [1] 630
## 
## $ttable
##          Estimate    SE   t.value p.value
## ma1            -1 4e-03 -250.3137  0.0000
## constant        0 1e-04   -0.0396  0.9685
## 
## $AIC
## [1] 0.08752984
## 
## $AICc
## [1] 0.08756003
## 
## $BIC
## [1] 0.108648

Trying some others set of \((p,q)\), we find ARIMA(1,1,2) fit seems to have residual looking white…

astsa::sarima(gvss[, "difflogx"], 1, 1, 2)
## initial  value -0.870541 
## iter   2 value -1.220140
## iter   3 value -1.310302
## iter   4 value -1.401194
## iter   5 value -1.408980
## iter   6 value -1.425829
## iter   7 value -1.429361
## iter   8 value -1.434015
## iter   9 value -1.438499
## iter  10 value -1.441187
## iter  11 value -1.445633
## iter  12 value -1.450858
## iter  13 value -1.450880
## iter  14 value -1.453073
## iter  15 value -1.455708
## iter  16 value -1.459379
## iter  17 value -1.467670
## iter  18 value -1.475933
## iter  19 value -1.478825
## iter  20 value -1.484882
## iter  21 value -1.487914
## iter  22 value -1.488032
## iter  23 value -1.488153
## iter  24 value -1.488289
## iter  25 value -1.488318
## iter  26 value -1.488322
## iter  27 value -1.488326
## iter  28 value -1.488326
## iter  28 value -1.488326
## iter  28 value -1.488326
## final  value -1.488326 
## converged
## initial  value -1.514045 
## iter   2 value -1.514882
## iter   3 value -1.522057
## iter   4 value -1.525768
## iter   5 value -1.529114
## iter   6 value -1.546210
## iter   7 value -1.548430
## iter   8 value -1.552339
## iter   9 value -1.559359
## iter  10 value -1.559872
## iter  11 value -1.560896
## iter  12 value -1.561764
## iter  13 value -1.562133
## iter  14 value -1.562365
## iter  15 value -1.562468
## iter  16 value -1.562645
## iter  17 value -1.562690
## iter  18 value -1.562743
## iter  19 value -1.562757
## iter  20 value -1.562757
## iter  21 value -1.562758
## iter  21 value -1.562758
## iter  21 value -1.562758
## final  value -1.562758 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1      ma1     ma2  constant
##       0.2303  -1.8823  0.8823         0
## s.e.  0.0509   0.0309  0.0307         0
## 
## sigma^2 estimated as 0.04314:  log likelihood = 90.89,  aic = -171.79
## 
## $degrees_of_freedom
## [1] 628
## 
## $ttable
##          Estimate     SE  t.value p.value
## ar1        0.2303 0.0509   4.5218  0.0000
## ma1       -1.8823 0.0309 -60.9701  0.0000
## ma2        0.8823 0.0307  28.7786  0.0000
## constant   0.0000 0.0000  -0.0516  0.9588
## 
## $AIC
## [1] -0.2718163
## 
## $AICc
## [1] -0.2717153
## 
## $BIC
## [1] -0.2366194

If you set (p,q) to be higher, you can still get residual looking white, for example, ARIMA(2,1,2). But it is not as good as ARIMA(1,1,2). Firstly, here we got some extra parameters. Secondly, now we have some fitted coefficients “not significant”. Thirdly, those AIC things doesn’t decrease.

astsa::sarima(gvss[, "difflogx"], 2, 1, 2)
## initial  value -0.869758 
## iter   2 value -1.161058
## iter   3 value -1.278198
## iter   4 value -1.376714
## iter   5 value -1.383040
## iter   6 value -1.392659
## iter   7 value -1.421969
## iter   8 value -1.473863
## iter   9 value -1.474470
## iter  10 value -1.474567
## iter  11 value -1.475945
## iter  12 value -1.476956
## iter  13 value -1.476960
## iter  14 value -1.477033
## iter  15 value -1.477159
## iter  16 value -1.481007
## iter  17 value -1.487182
## iter  18 value -1.496297
## iter  19 value -1.501379
## iter  20 value -1.506604
## iter  21 value -1.507949
## iter  22 value -1.509862
## iter  23 value -1.511000
## iter  24 value -1.511007
## iter  25 value -1.511023
## iter  26 value -1.511587
## iter  27 value -1.511876
## iter  28 value -1.511908
## iter  29 value -1.511919
## iter  30 value -1.511936
## iter  31 value -1.511937
## iter  31 value -1.511937
## iter  31 value -1.511937
## final  value -1.511937 
## converged
## initial  value -1.531170 
## iter   2 value -1.534383
## iter   3 value -1.537837
## iter   4 value -1.545542
## iter   5 value -1.545623
## iter   6 value -1.555317
## iter   7 value -1.558444
## iter   8 value -1.561608
## iter   9 value -1.562437
## iter  10 value -1.563039
## iter  11 value -1.563193
## iter  12 value -1.563393
## iter  13 value -1.563403
## iter  14 value -1.563408
## iter  15 value -1.563409
## iter  16 value -1.563409
## iter  17 value -1.563433
## iter  18 value -1.563433
## iter  18 value -1.563433
## iter  18 value -1.563433
## final  value -1.563433 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1     ar2      ma1     ma2  constant
##       0.2452  0.0495  -1.9031  0.9032         0
## s.e.  0.0483  0.0457   0.0298  0.0296         0
## 
## sigma^2 estimated as 0.04309:  log likelihood = 91.32,  aic = -170.64
## 
## $degrees_of_freedom
## [1] 627
## 
## $ttable
##          Estimate     SE  t.value p.value
## ar1        0.2452 0.0483   5.0789  0.0000
## ar2        0.0495 0.0457   1.0828  0.2793
## ma1       -1.9031 0.0298 -63.9514  0.0000
## ma2        0.9032 0.0296  30.5245  0.0000
## constant   0.0000 0.0000  -0.0585  0.9533
## 
## $AIC
## [1] -0.2700025
## 
## $AICc
## [1] -0.2698508
## 
## $BIC
## [1] -0.2277662

Moral of the story

We choose ARIMA(p, d, q) so that the residual looks white (stationary and zero autocorrelation, or maybe normal (normal+zero-correlation => independent!)), while the AIC, BIC things keep minimum.

Regression with ARMA error

Lynx–Hare Populations

Text book example 5.17

library(astsa)
df = ts(data.frame(
  Hare = Hare,
  Lynx = Lynx
))
title_string = "Lynx–Hare Populations"
plot(df, main = title_string)

Hare lag 1 seems to be a good predictor for Lynx.

library(zoo)
lag2.plot(Hare, Lynx, 5) # lead-lag relationship

Fit linear regression model

pp = as.zoo(ts.intersect(Lynx, HareL1 = lag(Hare,-1)))
summary(reg <- lm(pp$Lynx~ pp$HareL1))
## 
## Call:
## lm(formula = pp$Lynx ~ pp$HareL1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -31.107 -11.511  -2.193   7.579  45.632 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 15.84712    2.75817   5.746 1.29e-07 ***
## pp$HareL1    0.27265    0.04727   5.768 1.17e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.25 on 88 degrees of freedom
## Multiple R-squared:  0.2744, Adjusted R-squared:  0.2661 
## F-statistic: 33.27 on 1 and 88 DF,  p-value: 1.172e-07
acf2(resid(reg))

##      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8] [,9] [,10] [,11] [,12]
## ACF  0.73  0.30 -0.10 -0.35 -0.48 -0.43 -0.21  0.03 0.21  0.28  0.27  0.09
## PACF 0.73 -0.52 -0.18 -0.05 -0.25  0.08  0.13 -0.08 0.04  0.03  0.00 -0.24
##      [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
## ACF  -0.16 -0.32 -0.35 -0.26 -0.10  0.14  0.34  0.37
## PACF -0.10  0.07 -0.13  0.04  0.06  0.08  0.05 -0.10

Maybe an AR(2) model for the residual?

astsa::sarima(resid(reg), 2, 0, 0)
## initial  value 2.758050 
## iter   2 value 2.604881
## iter   3 value 2.343129
## iter   4 value 2.291704
## iter   5 value 2.207667
## iter   6 value 2.206504
## iter   7 value 2.206418
## iter   8 value 2.206415
## iter   9 value 2.206415
## iter  10 value 2.206415
## iter  10 value 2.206415
## iter  10 value 2.206415
## final  value 2.206415 
## converged
## initial  value 2.222241 
## iter   2 value 2.221684
## iter   3 value 2.221550
## iter   4 value 2.221543
## iter   5 value 2.221543
## iter   5 value 2.221543
## iter   5 value 2.221543
## final  value 2.221543 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1      ar2   xmean
##       1.1275  -0.5225  0.1064
## s.e.  0.0890   0.0901  2.4372
## 
## sigma^2 estimated as 83.69:  log likelihood = -327.64,  aic = 663.29
## 
## $degrees_of_freedom
## [1] 87
## 
## $ttable
##       Estimate     SE t.value p.value
## ar1     1.1275 0.0890 12.6690  0.0000
## ar2    -0.5225 0.0901 -5.7985  0.0000
## xmean   0.1064 2.4372  0.0436  0.9653
## 
## $AIC
## [1] 7.369852
## 
## $AICc
## [1] 7.372953
## 
## $BIC
## [1] 7.480955

Moral of the story

In STA108 Regression Analysis, we learned that the errors is assumed to be independent normal. Nevertheless, it is valid to fit linear regression when the errors are correlated. (Hopeful you will understand the rationale in mathematical statistics classes…)