Let’s plot the Glacial Varve Series. The log transform makes the variance stable over time. The 1st order differencing makes it look like stationary.
gvss = ts(data.frame(
x = astsa::varve[-1],
logx = log10(astsa::varve)[-1],
difflogx = diff(log10(astsa::varve))
))
title_string = "Glacial Varve Series"
plot(gvss, main = title_string)
Let’s fit ARIMA with \(d = 1\) with. To determine the order, let’s plot ACF and PACF. ACF has lag 1 significant, while PACF has a longer decay. Our first guess is ARIMA(0,1,1).
acf(gvss[, "difflogx"])
pacf(gvss[, "difflogx"])
However, the residual of the ARIMA(0,1,1) fit still has non-zero autocorrelation.
astsa::sarima(gvss[, "difflogx"], 0, 1, 1)
## initial value -0.871202
## iter 2 value -1.204551
## iter 3 value -1.244840
## iter 4 value -1.305272
## iter 5 value -1.351573
## iter 6 value -1.360171
## iter 7 value -1.371771
## iter 8 value -1.380065
## iter 9 value -1.383834
## iter 10 value -1.384684
## iter 11 value -1.384795
## iter 12 value -1.385201
## iter 13 value -1.385455
## iter 13 value -1.385455
## iter 14 value -1.385595
## iter 14 value -1.385595
## iter 15 value -1.385603
## iter 15 value -1.385603
## iter 15 value -1.385603
## final value -1.385603
## converged
## initial value -1.378338
## iter 2 value -1.379274
## iter 3 value -1.379901
## iter 4 value -1.379917
## iter 5 value -1.379920
## iter 5 value -1.379920
## iter 5 value -1.379920
## final value -1.379920
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ma1 constant
## -1.000 0e+00
## s.e. 0.004 1e-04
##
## sigma^2 estimated as 0.06266: log likelihood = -24.66, aic = 55.32
##
## $degrees_of_freedom
## [1] 630
##
## $ttable
## Estimate SE t.value p.value
## ma1 -1 4e-03 -250.3137 0.0000
## constant 0 1e-04 -0.0396 0.9685
##
## $AIC
## [1] 0.08752984
##
## $AICc
## [1] 0.08756003
##
## $BIC
## [1] 0.108648
Trying some others set of \((p,q)\), we find ARIMA(1,1,2) fit seems to have residual looking white…
astsa::sarima(gvss[, "difflogx"], 1, 1, 2)
## initial value -0.870541
## iter 2 value -1.220140
## iter 3 value -1.310302
## iter 4 value -1.401194
## iter 5 value -1.408980
## iter 6 value -1.425829
## iter 7 value -1.429361
## iter 8 value -1.434015
## iter 9 value -1.438499
## iter 10 value -1.441187
## iter 11 value -1.445633
## iter 12 value -1.450858
## iter 13 value -1.450880
## iter 14 value -1.453073
## iter 15 value -1.455708
## iter 16 value -1.459379
## iter 17 value -1.467670
## iter 18 value -1.475933
## iter 19 value -1.478825
## iter 20 value -1.484882
## iter 21 value -1.487914
## iter 22 value -1.488032
## iter 23 value -1.488153
## iter 24 value -1.488289
## iter 25 value -1.488318
## iter 26 value -1.488322
## iter 27 value -1.488326
## iter 28 value -1.488326
## iter 28 value -1.488326
## iter 28 value -1.488326
## final value -1.488326
## converged
## initial value -1.514045
## iter 2 value -1.514882
## iter 3 value -1.522057
## iter 4 value -1.525768
## iter 5 value -1.529114
## iter 6 value -1.546210
## iter 7 value -1.548430
## iter 8 value -1.552339
## iter 9 value -1.559359
## iter 10 value -1.559872
## iter 11 value -1.560896
## iter 12 value -1.561764
## iter 13 value -1.562133
## iter 14 value -1.562365
## iter 15 value -1.562468
## iter 16 value -1.562645
## iter 17 value -1.562690
## iter 18 value -1.562743
## iter 19 value -1.562757
## iter 20 value -1.562757
## iter 21 value -1.562758
## iter 21 value -1.562758
## iter 21 value -1.562758
## final value -1.562758
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ma1 ma2 constant
## 0.2303 -1.8823 0.8823 0
## s.e. 0.0509 0.0309 0.0307 0
##
## sigma^2 estimated as 0.04314: log likelihood = 90.89, aic = -171.79
##
## $degrees_of_freedom
## [1] 628
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.2303 0.0509 4.5218 0.0000
## ma1 -1.8823 0.0309 -60.9701 0.0000
## ma2 0.8823 0.0307 28.7786 0.0000
## constant 0.0000 0.0000 -0.0516 0.9588
##
## $AIC
## [1] -0.2718163
##
## $AICc
## [1] -0.2717153
##
## $BIC
## [1] -0.2366194
If you set (p,q) to be higher, you can still get residual looking white, for example, ARIMA(2,1,2). But it is not as good as ARIMA(1,1,2). Firstly, here we got some extra parameters. Secondly, now we have some fitted coefficients “not significant”. Thirdly, those AIC things doesn’t decrease.
astsa::sarima(gvss[, "difflogx"], 2, 1, 2)
## initial value -0.869758
## iter 2 value -1.161058
## iter 3 value -1.278198
## iter 4 value -1.376714
## iter 5 value -1.383040
## iter 6 value -1.392659
## iter 7 value -1.421969
## iter 8 value -1.473863
## iter 9 value -1.474470
## iter 10 value -1.474567
## iter 11 value -1.475945
## iter 12 value -1.476956
## iter 13 value -1.476960
## iter 14 value -1.477033
## iter 15 value -1.477159
## iter 16 value -1.481007
## iter 17 value -1.487182
## iter 18 value -1.496297
## iter 19 value -1.501379
## iter 20 value -1.506604
## iter 21 value -1.507949
## iter 22 value -1.509862
## iter 23 value -1.511000
## iter 24 value -1.511007
## iter 25 value -1.511023
## iter 26 value -1.511587
## iter 27 value -1.511876
## iter 28 value -1.511908
## iter 29 value -1.511919
## iter 30 value -1.511936
## iter 31 value -1.511937
## iter 31 value -1.511937
## iter 31 value -1.511937
## final value -1.511937
## converged
## initial value -1.531170
## iter 2 value -1.534383
## iter 3 value -1.537837
## iter 4 value -1.545542
## iter 5 value -1.545623
## iter 6 value -1.555317
## iter 7 value -1.558444
## iter 8 value -1.561608
## iter 9 value -1.562437
## iter 10 value -1.563039
## iter 11 value -1.563193
## iter 12 value -1.563393
## iter 13 value -1.563403
## iter 14 value -1.563408
## iter 15 value -1.563409
## iter 16 value -1.563409
## iter 17 value -1.563433
## iter 18 value -1.563433
## iter 18 value -1.563433
## iter 18 value -1.563433
## final value -1.563433
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 ma1 ma2 constant
## 0.2452 0.0495 -1.9031 0.9032 0
## s.e. 0.0483 0.0457 0.0298 0.0296 0
##
## sigma^2 estimated as 0.04309: log likelihood = 91.32, aic = -170.64
##
## $degrees_of_freedom
## [1] 627
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.2452 0.0483 5.0789 0.0000
## ar2 0.0495 0.0457 1.0828 0.2793
## ma1 -1.9031 0.0298 -63.9514 0.0000
## ma2 0.9032 0.0296 30.5245 0.0000
## constant 0.0000 0.0000 -0.0585 0.9533
##
## $AIC
## [1] -0.2700025
##
## $AICc
## [1] -0.2698508
##
## $BIC
## [1] -0.2277662
We choose ARIMA(p, d, q) so that the residual looks white (stationary and zero autocorrelation, or maybe normal (normal+zero-correlation => independent!)), while the AIC, BIC things keep minimum.
Text book example 5.17
library(astsa)
df = ts(data.frame(
Hare = Hare,
Lynx = Lynx
))
title_string = "Lynx–Hare Populations"
plot(df, main = title_string)
Hare lag 1 seems to be a good predictor for Lynx.
library(zoo)
lag2.plot(Hare, Lynx, 5) # lead-lag relationship
Fit linear regression model
pp = as.zoo(ts.intersect(Lynx, HareL1 = lag(Hare,-1)))
summary(reg <- lm(pp$Lynx~ pp$HareL1))
##
## Call:
## lm(formula = pp$Lynx ~ pp$HareL1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -31.107 -11.511 -2.193 7.579 45.632
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.84712 2.75817 5.746 1.29e-07 ***
## pp$HareL1 0.27265 0.04727 5.768 1.17e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.25 on 88 degrees of freedom
## Multiple R-squared: 0.2744, Adjusted R-squared: 0.2661
## F-statistic: 33.27 on 1 and 88 DF, p-value: 1.172e-07
acf2(resid(reg))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## ACF 0.73 0.30 -0.10 -0.35 -0.48 -0.43 -0.21 0.03 0.21 0.28 0.27 0.09
## PACF 0.73 -0.52 -0.18 -0.05 -0.25 0.08 0.13 -0.08 0.04 0.03 0.00 -0.24
## [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
## ACF -0.16 -0.32 -0.35 -0.26 -0.10 0.14 0.34 0.37
## PACF -0.10 0.07 -0.13 0.04 0.06 0.08 0.05 -0.10
Maybe an AR(2) model for the residual?
astsa::sarima(resid(reg), 2, 0, 0)
## initial value 2.758050
## iter 2 value 2.604881
## iter 3 value 2.343129
## iter 4 value 2.291704
## iter 5 value 2.207667
## iter 6 value 2.206504
## iter 7 value 2.206418
## iter 8 value 2.206415
## iter 9 value 2.206415
## iter 10 value 2.206415
## iter 10 value 2.206415
## iter 10 value 2.206415
## final value 2.206415
## converged
## initial value 2.222241
## iter 2 value 2.221684
## iter 3 value 2.221550
## iter 4 value 2.221543
## iter 5 value 2.221543
## iter 5 value 2.221543
## iter 5 value 2.221543
## final value 2.221543
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 xmean
## 1.1275 -0.5225 0.1064
## s.e. 0.0890 0.0901 2.4372
##
## sigma^2 estimated as 83.69: log likelihood = -327.64, aic = 663.29
##
## $degrees_of_freedom
## [1] 87
##
## $ttable
## Estimate SE t.value p.value
## ar1 1.1275 0.0890 12.6690 0.0000
## ar2 -0.5225 0.0901 -5.7985 0.0000
## xmean 0.1064 2.4372 0.0436 0.9653
##
## $AIC
## [1] 7.369852
##
## $AICc
## [1] 7.372953
##
## $BIC
## [1] 7.480955
In STA108 Regression Analysis, we learned that the errors is assumed to be independent normal. Nevertheless, it is valid to fit linear regression when the errors are correlated. (Hopeful you will understand the rationale in mathematical statistics classes…)