# 2022 Australian National University|EMET3007/8012 Assignment 2

## Instructions:

This assignment is worth either 20% or 25% of the final grade, and is worth a total of 75 points. All working must be shown for all questions. For questions which ask you to write a program, you must provide the code you used. If you have found code and then modified it, then the original source must be cited. The assignment is due by 5pm Friday 1st of October (Friday of Week 8), using Turnitin on Wattle. Late submissions will only be accepted with prior written approval. Good luck.

[10 marks] In this exercise we will consider four different specifications for forecasting monthly Australian total employed persons. The dataset (available on Wattle) AUSEmp 1oy 2022. csv contains three columns; the first column contains the date; the second contains the sales figures for that month (FRED data series LFEMTTTTAUM647N), and the third contains Australian GDP for that month.1] The data runs from January 1995 to January $2022 .$

Let $M_{i t}$ be a dummy variable that denotes the month of the year. Let $D_{i t}$ be a dummy variable which denotes the quarter of the year. The four specifications we consider are
\begin{aligned} &S_1: y_t=a_0+a_1 t+\alpha_4 D_{4 t}+\epsilon_t \ &S_2: y_t=a_1 t+\sum_{i=1}^4 \alpha_i D_{i t}+\epsilon_t \ &S_3: y_t=a_0+a_1 t+\beta_{12} M_{12, t}+\epsilon_t \ &S_4: y_t=a_1 t+\sum_{i=1}^{12} \beta_i M_{i t}+\epsilon_t \end{aligned}
where $\mathbb{E} \epsilon_t=0$ for all $t$.

a) For each specification, describe this specification in words.
b) For each specification, estimate the values of the parameters, and compute the MSE, $\mathrm{AIC}$, and BIC. If you make any changes to the csv file, please describe the changes you make. As always, you must include your code.
c) For each specification, compute the MSFE for the 1-step and 5-step ahead forecasts, with the out-of-sample forecasting exercise beginning at $T_0=50$.
d) For each specification, plot the out-of-sample forecasts and comment on the results.

[10 marks] Now add to Question 1 the additional assumption that $\epsilon_t \sim \mathcal{N}\left(0, \sigma^2\right)$. One estimator ${ }^2$ for $\sigma^2$ is
$$\hat{\sigma}^2=\frac{1}{T-k} \sum_{t=1}^T\left(y_t-\hat{y}_t\right)^2$$
where $\hat{y}_t$ is the estimated value of $y_t$ in the model and $k$ is the number of regressors in the specification.
a) For each specification $\left(S_1, \ldots, S_4\right)$, compute $\hat{\sigma}^2$.
b) For each specification, make a $95 \%$ probability forecast for the sales in June $2021 .$
c) For each specification, compute the probability that the total employed persons in June 2022 will be greater than $13.5$ million. According to the FRED series LFEMTTTTAUM647N, what was the actual employment level for that month.
d) Do you think the assumption that $\epsilon_t$ is iid is a reasonable assumption for this data series.

[10 marks] Here we investigate whether adding GDP $\mathrm{Gs}^3$ as a predictor can improve our forecasts. Consider the following modified specifications:
\begin{aligned} &S_1^{\prime}: y_t=a_0+a_1 t+\alpha_4 D_{4 t}+\gamma x_{t-h}+\epsilon_t \ &S_2^{\prime}: y_t=a_1 t+\sum_{i=1}^4 \alpha_i D_{i t}+\gamma x_{t-h}+\epsilon_t \ &S_3^{\prime}: y_t=a_0+a_1 t+\beta_{12} M_{12, t}+\gamma x_{t-h}+\epsilon_t \ &S_4^{\prime}: y_t=a_1 t+\sum_{i=1}^{12} \beta_i M_{i t}+\gamma x_{t-h}+\epsilon_t \end{aligned}
where $\mathbb{E} \epsilon_t=0$ for all $t$, and $x_{t-h}$ is GDP at time $t-h$. For each specification, compute the MSFE for the 1-step ahead, and the 5-step ahead forecasts, with the out-of-sample forecasting exercise beginning at $T_0=50$. For each specification, plot the out-of-sample forecasts and comment on the results.

[15 marks] Here we investigate whether Holt-Winters smoothing can improve our forecasts. Use a Holt-Winters smoothing method with seasonality, to produce 1-step ahead and 5-step ahead forecasts and compute the MSFE for these forecasts. You should use smoothing parameters $\alpha=\beta=\gamma=0.3$ and start the out-of-sample forecasting exercise at $T_0=50$. Plot these out-of-sample forecasts and comment on the results.
Additionally, estimate the values for $\alpha, \beta$, and $\gamma$ which minimise the MSFE. Find the MSFE for these parameter vales and compare it to the baseline $\alpha=\beta=\gamma=0.3$.

[5 marks] Questions 1, 3 and 4 each provided alternative models for forecasting Australian Total Employment. Compare the efficacy of these forecasts. Your comparison should include discussions of MSFE, but must also make qualitative observations (typically based on your graphs).

[10 marks] Develop another model, either based on material from class or otherwise, to forecast Australian Total Employment. Your new model should perform better (have a lower MSFE or MAFE) than all models from Questions 1,3, and 4. As part of your response to this question you must provide:
a) a brief written explanation of what your model is doing,
b) a brief statement on why you think your new model will perform better,
c) any relevant equations or mathematics/statistics to describe the model,
d) the code to run the model, and
e) the MSFE and/or MAFE error found by your model, and a brief discussion of how this compares to previous cases.

[15 marks] Consider the ARX(1) model
$$y_t=\mu+a t+\rho y_{t-1}+\epsilon_t$$
where the errors follow an $\mathrm{AR}(2)$ process
$$\epsilon_t=\phi_1 \epsilon_{t-1}+\phi_2 \epsilon_{t-2}+u_t, \quad \mathbf{u} \sim \mathcal{N}\left(0, \sigma^2 I\right)$$
for $t=1, \ldots, T$ and $e_{-1}=e_0=0$. Suppose $\phi_1, \phi_2$ are known. Find (analytically) the maximum likelihood estimators for $\mu, a, \rho$, and $\sigma^2$.

Hint: First write $y$ and $\epsilon$ in vector/matrix form. You may wish to use different looking forms for each. Find the distribution of $\epsilon$ and $y$. Then apply some appropriate calculus. You may want to let $H=I-\phi_1 L-\phi_2 L^2$, where $I$ is the $T \times T$ identity matrix, and $L$ is the lag matrix.