7.2 Trend methods

Holt’s linear trend method

Holt (1957) extended simple exponential smoothing to allow the forecasting of data with a trend. This method involves a forecast equation and two smoothing equations (one for the level and one for the trend): \[\begin{align*} \text{Forecast equation}&& \hat{y}_{t+h|t} &= \ell_{t} + hb_{t} \\ \text{Level equation} && \ell_{t} &= \alpha y_{t} + (1 - \alpha)(\ell_{t-1} + b_{t-1})\\ \text{Trend equation} && b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 -\beta^*)b_{t-1}, \end{align*}\]

where \(\ell_t\) denotes an estimate of the level of the series at time \(t\), \(b_t\) denotes an estimate of the trend (slope) of the series at time \(t\), \(\alpha\) is the smoothing parameter for the level, \(0\le\alpha\le1\), and \(\beta^*\) is the smoothing parameter for the trend, \(0\le\beta^*\le1\). (We denote this as \(\beta^*\) instead of \(\beta\) for reasons that will be explained in Section 7.5.)

As with simple exponential smoothing, the level equation here shows that \(\ell_t\) is a weighted average of observation \(y_t\) and the one-step-ahead training forecast for time \(t\), here given by \(\ell_{t-1} + b_{t-1}\). The trend equation shows that \(b_t\) is a weighted average of the estimated trend at time \(t\) based on \(\ell_{t} - \ell_{t-1}\) and \(b_{t-1}\), the previous estimate of the trend.

The forecast function is no longer flat but trending. The \(h\)-step-ahead forecast is equal to the last estimated level plus \(h\) times the last estimated trend value. Hence the forecasts are a linear function of \(h\).

Example: Air Passengers

air <- window(ausair, start=1990)
autoplot(air) +
 ggtitle("Air passengers in Australia") +
  xlab("Year") + ylab("millions of passengers")
Total annual passengers of air carriers registered in Australia. 1990-2014.

Figure 7.3: Total annual passengers of air carriers registered in Australia. 1990-2014.

Figure 7.3 shows annual passenger numbers for Australian airlines. In Table 7.3 we demonstrate the application of Holt’s method to these data. The smoothing parameters, \(\alpha\) and \(\beta\), and the initial values \(\ell_0\) and \(b_0\) are estimated by minimizing the SSE for the one-step training errors as in Section 7.1.

fc <- holt(air, h=5)
#> Warning in cbind(1989:2014, 0:(NROW(tmp) - 1), tmp): number of rows of
#> result is not a multiple of vector length (arg 1)
Table 7.3: Applying Holt’s linear method with \(\alpha=0.8317\) and \(\beta^*=0.0001\) to Australian air passenger data (millions of passengers).
Year Time Observation Level Slope Forecast
\(t\) \(y_t\) \(\ell_t\) \(b_t\) \(\hat{y}_{t|t-1}\)
1989 0 16.12 2.063
1990 1 17.55 17.66 2.063 18.18
1991 2 21.86 21.50 2.063 19.72
1992 3 23.89 23.83 2.063 23.56
1993 4 26.93 26.76 2.064 25.90
1994 5 26.89 27.21 2.063 28.82
1995 6 28.83 28.91 2.063 29.28
1996 7 30.08 30.23 2.063 30.97
1997 8 30.95 31.18 2.063 32.29
1998 9 30.19 30.70 2.063 33.24
1999 10 31.58 31.78 2.063 32.76
2000 11 32.58 32.79 2.063 33.84
2001 12 33.48 33.71 2.062 34.85
2002 13 39.02 38.47 2.063 35.77
2003 14 41.39 41.24 2.063 40.54
2004 15 41.60 41.88 2.063 43.31
2005 16 44.66 44.54 2.063 43.95
2006 17 46.95 46.89 2.063 46.60
2007 18 48.73 48.77 2.063 48.96
2008 19 51.49 51.38 2.063 50.83
2009 20 50.03 50.60 2.062 53.44
2010 21 60.64 59.30 2.063 52.66
2011 22 63.36 63.02 2.063 61.36
2012 23 66.36 66.14 2.064 65.09
2013 24 68.20 68.20 2.064 68.21
2014 25 67.68 68.11 2.063 70.26
\(h\) \(\hat{y}_{t+h|t}\)
1 71.51
2 73.57
3 75.63
4 77.70
5 79.76

The very small value of \(\beta^*\) means that the slope hardly changes over time. Figure 7.4 shows the forecasts for years 2014–2018.

Damped trend methods

The forecasts generated by Holt’s linear method display a constant trend (increasing or decreasing) indefinitely into the future. Empirical evidence indicates that these methods tend to over-forecast, especially for longer forecast horizons. Motivated by this observation, Gardner Jr and McKenzie (1985) introduced a parameter that “dampens” the trend to a flat line some time in the future. Methods that include a damped trend have proven to be very successful, and are arguably the most popular individual methods when forecasts are required automatically for many series.

In conjunction with the smoothing parameters \(\alpha\) and \(\beta^*\) (with values between 0 and 1 as in Holt’s method), this method also includes a damping parameter \(0<\phi<1\): \[\begin{align*} \hat{y}_{t+h|t} &= \ell_{t} + (\phi+\phi^2 + \dots + \phi^{h})b_{t} \\ \ell_{t} &= \alpha y_{t} + (1 - \alpha)(\ell_{t-1} + \phi b_{t-1})\\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 -\beta^*)\phi b_{t-1}. \end{align*}\]

If \(\phi=1\), the method is identical to Holt’s linear method. For values between \(0\) and \(1\), \(\phi\) dampens the trend so that it approaches a constant some time in the future. In fact, the forecasts converge to \(\ell_T+\phi b_T/(1-\phi)\) as \(h\rightarrow\infty\) for any value \(0<\phi<1\). This means that short-run forecasts are trended while long-run forecasts are constant.

In practice, \(\phi\) is rarely less than 0.8 as the damping has a very strong effect for smaller values. Values of \(\phi\) close to 1 will mean that a damped model is not able to be distinguished from a non-damped model. For these reasons, we usually restrict \(\phi\) to a minimum of 0.8 and a maximum of 0.98.

Example: Air Passengers (continued)

Figure 7.4 shows the forecasts for years 2014–2018 generated from Holt’s linear trend method and the damped trend method.

fc <- holt(air, h=15)
fc2 <- holt(air, damped=TRUE, phi = 0.9, h=15)
autoplot(air) +
  forecast::autolayer(fc, PI=FALSE, series="Holt's method") +
  forecast::autolayer(fc2, PI=FALSE, series="Damped Holt's method") +
  ggtitle("Forecasts from Holt's method") +
  xlab("Year") + ylab("Air passengers in Australia (millions)") +
  guides(colour=guide_legend(title="Forecast"))
Forecasting Air Passengers in Australia (millions of passengers). For the damped trend method, $\phi=0.90$.

Figure 7.4: Forecasting Air Passengers in Australia (millions of passengers). For the damped trend method, \(\phi=0.90\).

We have set the damping parameter to a relatively low number \((\phi=0.90)\) to exaggerate the effect of damping for comparison. Usually, we would estimate \(\phi\) along with the other parameters.

Example: Sheep in Asia

In this example, we compare the forecasting performance of the three exponential smoothing methods that we have considered so far in forecasting the sheep livestock population in Asia. The data spans the period 1970–2007 and is shown in Figure 7.5.

autoplot(livestock) +
  xlab("Year") + ylab("Livestock, sheep in Asia (millions)")
Annual sheep livestock numbers in Asia (in million head)

Figure 7.5: Annual sheep livestock numbers in Asia (in million head)

We will use time series cross-validation to compare the one-step forecast accuracy of the three methods.

e1 <- tsCV(livestock, ses, h=1)
e2 <- tsCV(livestock, holt, h=1)
e3 <- tsCV(livestock, holt, damped=TRUE, h=1)
# Compare MSE:
mean(e1^2, na.rm=TRUE)
#> [1] 171
mean(e2^2, na.rm=TRUE)
#> [1] 168
mean(e3^2, na.rm=TRUE)
#> [1] 165
# Compare MAE:
mean(abs(e1), na.rm=TRUE)
#> [1] 8.3
mean(abs(e2), na.rm=TRUE)
#> [1] 8.41
mean(abs(e3), na.rm=TRUE)
#> [1] 8.26

Based on MSE, Holt’s method is best. But based on MAE, simple exponential smoothing is best. Conflicts such as this are common in forecasting comparisons. As forecasting tasks can vary by many dimensions (length of forecast horizon, size of test set, forecast error measures, frequency of data, etc.), it is unlikely that one method will be better than all others for all forecasting scenarios. What we require from a forecasting method are consistently sensible forecasts, and these should be frequently evaluated against the task at hand. In this case, the data are clearly trended, so we will prefer Holt’s method, and apply it to the whole data set to get forecasts for future years.

fc <- holt(livestock)
# Estimated parameters:
fc[["model"]]
#> Holt's method 
#> 
#> Call:
#>  holt(y = livestock) 
#> 
#>   Smoothing parameters:
#>     alpha = 0.9999 
#>     beta  = 1e-04 
#> 
#>   Initial states:
#>     l = 225.192 
#>     b = 4.9532 
#> 
#>   sigma:  12
#> 
#>  AIC AICc  BIC 
#>  425  426  434

The smoothing parameter for the slope parameter is estimated to be essentially zero, indicating that the trend is not changing over time. The value of \(\alpha\) is very close to one, showing that the level reacts strongly to each new observation.

autoplot(fc) +
  xlab("Year") + ylab("Livestock, sheep in Asia (millions)")
Forecasting livestock, sheep in Asia: comparing forecasting performance of non-seasonal method.

Figure 7.6: Forecasting livestock, sheep in Asia: comparing forecasting performance of non-seasonal method.

The resulting forecasts look sensible with increasing trend, and relatively wide prediction intervals reflecting the variation in the historical data. The prediction intervals are calculated using the methods described in Section 7.5.

References

Holt, Charles E. 1957. “Forecasting Seasonals and Trends by Exponentially Weighted Averages.” O.N.R. Memorandum 52. Carnegie Institute of Technology, Pittsburgh USA.

Gardner Jr, Everette S, and Ed McKenzie. 1985. “Forecasting Trends in Time Series.” Management Science 31 (10): 1237–46.