8.4 Moving average models

Rather than using past values of the forecast variable in a regression, a moving average model uses past forecast errors in a regression-like model. $y_{t} = c + \varepsilon_t + \theta_{1}\varepsilon_{t-1} + \theta_{2}\varepsilon_{t-2} + \dots + \theta_{q}\varepsilon_{t-q},$ where $$\varepsilon_t$$ is white noise. We refer to this as an MA($$q$$) model, a moving average model of order $$q$$. Of course, we do not observe the values of $$\varepsilon_t$$, so it is not really a regression in the usual sense.

Notice that each value of $$y_t$$ can be thought of as a weighted moving average of the past few forecast errors. However, moving average models should not be confused with the moving average smoothing we discussed in Chapter 6. A moving average model is used for forecasting future values, while moving average smoothing is used for estimating the trend-cycle of past values.

Figure 8.6 shows some data from an MA(1) model and an MA(2) model. Changing the parameters $$\theta_1,\dots,\theta_q$$ results in different time series patterns. As with autoregressive models, the variance of the error term $$\varepsilon_t$$ will only change the scale of the series, not the patterns.

It is possible to write any stationary AR($$p$$) model as an MA($$\infty$$) model. For example, using repeated substitution, we can demonstrate this for an AR(1) model: \begin{align*} y_t &= \phi_1y_{t-1} + \varepsilon_t\\ &= \phi_1(\phi_1y_{t-2} + \varepsilon_{t-1}) + \varepsilon_t\\ &= \phi_1^2y_{t-2} + \phi_1 \varepsilon_{t-1} + \varepsilon_t\\ &= \phi_1^3y_{t-3} + \phi_1^2\varepsilon_{t-2} + \phi_1 \varepsilon_{t-1} + \varepsilon_t\\ &\text{etc.} \end{align*}

Provided $$-1 < \phi_1 < 1$$, the value of $$\phi_1^k$$ will get smaller as $$k$$ gets larger. So eventually we obtain $y_t = \varepsilon_t + \phi_1 \varepsilon_{t-1} + \phi_1^2 \varepsilon_{t-2} + \phi_1^3 \varepsilon_{t-3} + \cdots,$ an MA($$\infty$$) process.

The reverse result holds if we impose some constraints on the MA parameters. Then the MA model is called invertible. That is, we can write any invertible MA($$q$$) process as an AR($$\infty$$) process. Invertible models are not simply introduced to enable us to convert from MA models to AR models. They also have some desirable mathematical properties.

For example, consider the MA(1) process, $$y_{t} = \varepsilon_t + \theta_{1}\varepsilon_{t-1}$$. In its AR($$\infty$$) representation, the most recent error can be written as a linear function of current and past observations: $\varepsilon_t = \sum_{j=0}^\infty (-\theta)^j y_{t-j}.$ When $$|\theta| > 1$$, the weights increase as lags increase, so the more distant the observations the greater their influence on the current error. When $$|\theta|=1$$, the weights are constant in size, and the distant observations have the same influence as the recent observations. As neither of these situations make much sense, we require $$|\theta|<1$$, so the most recent observations have higher weight than observations from the more distant past. Thus, the process is invertible when $$|\theta|<1$$.

The invertibility constraints for other models are similar to the stationarity constraints.

• For an MA(1) model: $$-1<\theta_1<1$$.
• For an MA(2) model: $$-1<\theta_2<1,~$$ $$\theta_2+\theta_1 >-1,~$$ $$\theta_1 -\theta_2 < 1$$.

More complicated conditions hold for $$q\ge3$$. Again, R will take care of these constraints when estimating the models.