5.10 Exercises

  1. Electricity consumption was recorded for a small town on 12 consecutive days. The following maximum temperatures (degrees Celsius) and consumption (megawatt-hours) were recorded for each day. TODO: change the econsumption to a ts of 12 concecutive days - change the lm to tslm below

    1 2 3 4 5 6 7 8 9 10 11 12
    Mwh 16.3 16.8 15.5 18.2 15.2 17.5 19.8 19.0 17.5 16.0 19.6 18.0
    Temp 29.3 21.7 23.7 10.4 29.7 11.9 9.0 23.4 17.8 30.0 8.6 11.8
    1. Plot the data and find the regression model for Mwh with temperature as an explanatory variable. Why is there a negative relationship?
    2. Produce a residual plot. Is the model adequate? Are there any outliers or influential observations?
    3. Use the model to predict the electricity consumption that you would expect for the next day if the maximum temperature was \(10^\circ\) and compare it with the forecast if the with maximum temperature was \(35^\circ\). Do you believe these predictions?
    4. Give prediction intervals for your forecasts. The following R code will get you started:

      plot(Mwh ~ temp, data=econsumption)
      fit <- lm(Mwh ~ temp, data=econsumption)
      plot(residuals(fit) ~ temp, data=econsumption)
      forecast(fit, newdata=data.frame(temp=c(10,35)))
  2. Data set olympic contains the winning times (in seconds) for the men’s 400 meters final in each Olympic Games from 1896 to 2012.

    1. Plot the winning time against the year. Describe the main features of the scatterplot.
    2. Fit a regression line to the data. Obviously the winning times have been decreasing, but at what average rate per year?
    3. Plot the residuals against the year. What does this indicate about the suitability of the fitted line?
    4. Predict the winning time for the men’s 400 meters final in the 2000, 2004, 2008 and 2012 Olympics. Give a prediction interval for each of your forecasts. What assumptions have you made in these calculations?
    5. Find out the actual winning times for these Olympics (see www.databaseolympics.com). How good were your forecasts and prediction intervals?
  3. Type easter(ausbeer) and interpret what you see.

  4. An elasticity coefficient is the ratio of the percentage change in the forecast variable (\(y\)) to the percentage change in the predictor variable (\(x\)). Mathematically, the elasticity is defined as \((dy/dx)\times(x/y)\). Consider the log-log model, \[\log y=\beta_0+\beta_1 \log x + \varepsilon.\] Express \(y\) as a function of \(x\) and show that the coefficient \(\beta_1\) is the elasticity coefficient.

  5. The data set fancy concerns the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. The shop is situated on the wharf at a beach resort town in Queensland, Australia. The sales volume varies with the seasonal population of tourists. There is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988. Over time, the shop has expanded its premises, range of products, and staff.

    1. Produce a time plot of the data and describe the patterns in the graph. Identify any unusual or unexpected fluctuations in the time series.
    2. Explain why it is necessary to take logarithms of these data before fitting a model.
    3. Use R to fit a regression model to the logarithms of these sales data with a linear trend, seasonal dummies and a “surfing festival” dummy variable.
    4. Plot the residuals against time and against the fitted values. Do these plots reveal any problems with the model?
    5. Do boxplots of the residuals for each month. Does this reveal any problems with the model?
    6. What do the values of the coefficients tell you about each variable?
    7. What does the Breusch-Godfrey test tell you about your model?
    8. Regardless of your answers to the above questions, use your regression model to predict the monthly sales for 1994, 1995, and 1996. Produce prediction intervals for each of your forecasts.
    9. Transform your predictions and intervals to obtain predictions and intervals for the raw data.
    10. How could you improve these predictions by modifying the model?
  6. TODO: you got to this before me ;-) The gasoline series consists of weekly data for supplies of US finished motor gasoline product, from 2 February 1991 to 20 January 2017. The units are in “thousand barrels per day”. Consider only the data to the end of 2004.
    1. Fit a harmonic regression with trend to the data. Select the appropriate number of Fourier terms to include by minimizing the AICc or CV value.
    2. Check the residuals of the final model using the checkresiduals() function. Even though the residuals fail the correlation tests, the results are probably not severe enough to make much difference to the forecasts and forecast intervals. (Note that the correlations are relatively small, even though they are significant.)
    3. To forecast using harmonic regression, you will need to generate the future values of the Fourier terms. This can be done as follows.

      fc <- forecast(fit, fourier(x, K, h))

      where fit is the fitted model using tslm, K is the number of Fourier terms used in creating fit, and h is the forecast horizon required.

      Forecast the next year of data.
    4. Plot the forecasts along with the actual data for 2005. What do you find?

  7. (For advanced readers following on from Section 5.7).

    Using matrix notation it was shown that if \(\bm{y}=\bm{X}\bm{\beta}+\bm{\varepsilon}\), where \(\bm{e}\) has mean \(\bm{0}\) and variance matrix \(\sigma^2\bm{I}\), the estimated coefficients are given by \(\hat{\bm{\beta}}=(\bm{X}'\bm{X})^{-1}\bm{X}'\bm{y}\) and a forecast is given by \(\hat{y}=\bm{x}^*\hat{\bm{\beta}}=\bm{x}^*(\bm{X}'\bm{X})^{-1}\bm{X}'\bm{y}\) where \(\bm{x}^*\) is a row vector containing the values of the regressors for the forecast (in the same format as \(\bm{X}\)), and the forecast variance is given by \(var(\hat{y})=\sigma^2 \left[1+\bm{x}^*(\bm{X}'\bm{X})^{-1}(\bm{x}^*)'\right].\)

    Consider the simple time trend model where \(y_t = \beta_0 + \beta_1t\). Using the following results, \[ \sum^{T}_{t=1}{t}=\frac{1}{2}T(T+1),\quad \sum^{T}_{t=1}{t^2}=\frac{1}{6}T(T+1)(2T+1) \] derive the following expressions:

    1. \(\displaystyle\bm{X}'\bm{X}=\frac{1}{6}\left[ \begin{array}{cc} 6T & 3T(T+1) \\ 3T(T+1) & T(T+1)(2T+1) \\ \end{array} \right]\)

    2. \(\displaystyle(\bm{X}'\bm{X})^{-1}=\frac{2}{T(T^2-1)}\left[ \begin{array}{cc} (T+1)(2T+1) & -3(T+1) \\ -3(T+1) & 6 \\ \end{array} \right]\)

    3. \(\displaystyle\hat{\beta}_0=\frac{2}{T(T-1)}\left[(2T+1)\sum^T_{t=1}y_t-3\sum^T_{t=1}ty_t \right]\)

      \(\displaystyle\hat{\beta}_1=\frac{6}{T(T^2-1)}\left[2\sum^T_{t=1}ty_t-(T+1)\sum^T_{t=1}y_t \right]\)

    4. \(\displaystyle\text{Var}(\hat{y}_{t})=\hat{\sigma}^2\left[1+\frac{2}{T(T-1)}\left(1-4T-6h+6\frac{(T+h)^2}{T+1}\right)\right]\)