The Headline Forecast – regression prediction model.
Posted by Possum Comitatus on November 16, 2007
I’ve finally built the model to forecast the ALP TPP result. This gets a little stats heavy, so I’ll try to walk those folks through it that might find it hard going as best I can, but I’ll answer any questions you have in the comments.
What I used to build the forecast is the monthly average of Newspolls going back to 1996 when Howard was elected. The reason I use the monthly average is that it dampens a lot of the noise in the individual polls, and gives us a time consistent series of data that can be used for long term analysis.
I don’t trust the preference allocations for Newspoll, so what I did was construct my own based on the preference distributions for each election and let the preference flows adapt over time between elections, so the two party preferred vote from polls straight after an election used nearly all of the preference distribution from the previous election, polls from halfway between elections used a preference allocation based on half the previous election and half the next election, and polls just before an election had nearly all of their preferences distributed as they were at that impending election. For the 2007 election preference distribution I’ve simply used the 2004 preferences (which may slightly underestimate the ALP TPP, but not by very much).
The model itself is a regression model built specifically to forecast one month and only one month ahead.
The model is a little unorthodox because using polling data in a model is a little unorthodox to begin with, but the important thing here is that it works – even if it suffers a little econometric impurity in the process.
The variables I’ve used are split into two types.
Firstly, Dummy Variables – which are variables that have a value of zero or a value of one. What they let us do is measure how the level of the ALP TPP vote changed as function of specific periods time that represent events when we regress the ALP vote against them. You can get a gist for how they play out here:
The Dummy variables I’ve used are:
Dummyhhmoon – which is a dummy variable representing the Howard “honeymoon period” in 1996 as well as 2 months after every election. It has a value of one for the first 12 months of Howards government as well as for the two months after every election other than 1996. At all other times it’s value is zero.
Dummylatham – which has a value of 1 for those months Latham was leader and a value of zero for all other periods.
Dummyrudd – which has a value of 1 for the months Rudd has been leader and a value of zero for all other periods.
Dummyworkchoices – which has a value of 1 since November 2005 when Workchoices was in Parliament and the union campaign against it revved up.
DummyElection which has a value of 1 for the month an election is on and a value of zero at all other periods. I use this as an interactive dummy variable so I can emulate special election campaign effects with long term satisfaction rating changes.
Secondly, the other type of variables I use in the model are:
ACNALPTPP(-1) – which is the previous months value of the ACNielsen two-party preferred vote for the ALP. By using ACN, I can effectively anchor the forecast to the less volatile ACN series, while still using the Newspoll estimates and its qualitative data estimates in a consistent way without running into too many “house effect” issues that may be occurring in the Newspoll weighting.
PMDISAT(-1) – which is the previous months average of the Prime Ministers dissatisfaction rating using Newspoll data.
OPPRIMARY(-1) – which is the Oppositions primary vote in the previous month using Newspoll data. This lets the forecast ALP TPP vote adapt to the size of the ALP primary vote.
Then these two, which are probably the two most important variables in the model and fill very specific rolls.
What this represents is the difference between last months PM dissatisfaction rating and last years PM dissatisfaction rating, but is only modelled during the month of an election.
So what it effectively does is modify the forecast of the model only in months that an election is on, and does so on the basis of the size of the long term change in the PMs dissatisfaction rating.
Similarly, our other complicated variable is:
What this represents is the recent medium term change in the Opposition leaders satisfaction rating. It’s the difference between last months satisfaction rating and the satisfaction rating of 3 months ago – but is only modelled during the month of an election.
What it effectively does is modify the forecast of the model only in months that an election is on, and does so on the basis of the size of the medium term change in the Opposition leaders dissatisfaction rating.
What these two variables do is simulate the process of voters coming to a conclusion about who they will vote for in the month of the election, using long term changes in satisfaction and dissatisfaction with each party and its leader. It allows for “it’s time” factors and “he has certainly improved over the last year” and “he’s getting worse as time goes on” and “he wasn’t what I thought he was like” type factors to be accounted for in terms of the way they influence voter movement in an election, but through an error correction type mechanism.
So the Election Forecasting Model is:
And we’ll use ordinary least squares regression to do the number crunching which turns out as:
What is important here is that all of these variables are statistically significant. This model explains about 76% of the variation in the Newspoll estimate of the ALP TPP vote since 1997, but it’s built with the aim of being more accurate for the election date than it is at other times via those two long looking variables.
Onto more of the forecast stats:
That’s mainly for the stats people that shows the model does its forecast job extremely well, with very little overall error.
The forecasts this model produces don’t exhibit a lot of the polling overshoot that Newspoll experiences when a new leader comes along, or a Tampa and S11 shocks the system. But it still tracks the changes in the TPP vote for the ALP as we can see with the following graphic:
The blue line represents the forecasts the model produces for each period, whereas the red line shows the actual Newspoll TPP vote for the ALP that the model is attempting to predict. The model misses the troughs and peaks of most of the big volatile movements in the ALP vote because the underlying dynamics of satisfaction ratings, primary vote level, and importantly, the slow moving ACNielson in the previous month don’t support the overactive Newspoll during these periods.
So how good is the model using previous elections?
In 1998, the model predicted an ALP TPP of 50.82 whereas the actual result was 50.91
In 2001 the model predicted an ALP TPP of 49.15 whereas the actual result was 49.07.
In 2004 the model predicted an ALP TPP of 47.23 whereas the actual result was 47.20.
It’s actually more accurate at election times than ordinary periods because those two little complicated variables were arrived at to simulate the processes involved in voters coming to a decision in the campaign. 2001 and 2004 were very different elections with movements going in opposite directions during the campaign period, but the model estimated both results fairly accurately by any measure.
I took the approach of rather than building some error correction components into the model for every period, it only needed to be done for specific periods when elections occur. And it’s probably also worth mentioning that the model predicts each months vote based on last months figures.
So what is the forecast for the election?
An ALP two party preferred result of 55.15%
How that will split between the States will come (hopefully) by Monday.