Possums Pollytics

Politics, elections and piffle plinking

Posts Tagged ‘Polling’

The Headline Forecast – regression prediction model.

Posted by Possum Comitatus on November 16, 2007

I’ve finally built the model to forecast the ALP TPP result. This gets a little stats heavy, so I’ll try to walk those folks through it that might find it hard going as best I can, but I’ll answer any questions you have in the comments.

What I used to build the forecast is the monthly average of Newspolls going back to 1996 when Howard was elected. The reason I use the monthly average is that it dampens a lot of the noise in the individual polls, and gives us a time consistent series of data that can be used for long term analysis.

I don’t trust the preference allocations for Newspoll, so what I did was construct my own based on the preference distributions for each election and let the preference flows adapt over time between elections, so the two party preferred vote from polls straight after an election used nearly all of the preference distribution from the previous election, polls from halfway between elections used a preference allocation based on half the previous election and half the next election, and polls just before an election had nearly all of their preferences distributed as they were at that impending election. For the 2007 election preference distribution I’ve simply used the 2004 preferences (which may slightly underestimate the ALP TPP, but not by very much).

The model itself is a regression model built specifically to forecast one month and only one month ahead.

The model is a little unorthodox because using polling data in a model is a little unorthodox to begin with, but the important thing here is that it works – even if it suffers a little econometric impurity in the process.

The variables I’ve used are split into two types.

Firstly, Dummy Variables – which are variables that have a value of zero or a value of one. What they let us do is measure how the level of the ALP TPP vote changed as function of specific periods time that represent events when we regress the ALP vote against them. You can get a gist for how they play out here:

The Dummy variables I’ve used are:

Dummyhhmoon – which is a dummy variable representing the Howard “honeymoon period” in 1996 as well as 2 months after every election. It has a value of one for the first 12 months of Howards government as well as for the two months after every election other than 1996. At all other times it’s value is zero.

Dummylatham – which has a value of 1 for those months Latham was leader and a value of zero for all other periods.

Dummyrudd – which has a value of 1 for the months Rudd has been leader and a value of zero for all other periods.

Dummyworkchoices – which has a value of 1 since November 2005 when Workchoices was in Parliament and the union campaign against it revved up.

DummyElection which has a value of 1 for the month an election is on and a value of zero at all other periods. I use this as an interactive dummy variable so I can emulate special election campaign effects with long term satisfaction rating changes.

Secondly, the other type of variables I use in the model are:

ACNALPTPP(-1) – which is the previous months value of the ACNielsen two-party preferred vote for the ALP. By using ACN, I can effectively anchor the forecast to the less volatile ACN series, while still using the Newspoll estimates and its qualitative data estimates in a consistent way without running into too many “house effect” issues that may be occurring in the Newspoll weighting.

PMDISAT(-1) – which is the previous months average of the Prime Ministers dissatisfaction rating using Newspoll data.

OPPRIMARY(-1) – which is the Oppositions primary vote in the previous month using Newspoll data. This lets the forecast ALP TPP vote adapt to the size of the ALP primary vote.

Then these two, which are probably the two most important variables in the model and fill very specific rolls.

((PMDISAT(-1)-PMDISAT(-12))*DUMMYELECTION

What this represents is the difference between last months PM dissatisfaction rating and last years PM dissatisfaction rating, but is only modelled during the month of an election.

So what it effectively does is modify the forecast of the model only in months that an election is on, and does so on the basis of the size of the long term change in the PMs dissatisfaction rating.

Similarly, our other complicated variable is:

(OPSAT(-1)-OPSAT(-3))*DUMMYELECTION

What this represents is the recent medium term change in the Opposition leaders satisfaction rating. It’s the difference between last months satisfaction rating and the satisfaction rating of 3 months ago – but is only modelled during the month of an election.

What it effectively does is modify the forecast of the model only in months that an election is on, and does so on the basis of the size of the medium term change in the Opposition leaders dissatisfaction rating.

What these two variables do is simulate the process of voters coming to a conclusion about who they will vote for in the month of the election, using long term changes in satisfaction and dissatisfaction with each party and its leader. It allows for “it’s time” factors and “he has certainly improved over the last year” and “he’s getting worse as time goes on” and “he wasn’t what I thought he was like” type factors to be accounted for in terms of the way they influence voter movement in an election, but through an error correction type mechanism.

So the Election Forecasting Model is:

forecastequation2.jpg

And we’ll use ordinary least squares regression to do the number crunching which turns out as:

forecastoutput11.jpg

What is important here is that all of these variables are statistically significant. This model explains about 76% of the variation in the Newspoll estimate of the ALP TPP vote since 1997, but it’s built with the aim of being more accurate for the election date than it is at other times via those two long looking variables.

Onto more of the forecast stats:

forecastalptpp31.jpg

That’s mainly for the stats people that shows the model does its forecast job extremely well, with very little overall error.

The forecasts this model produces don’t exhibit a lot of the polling overshoot that Newspoll experiences when a new leader comes along, or a Tampa and S11 shocks the system. But it still tracks the changes in the TPP vote for the ALP as we can see with the following graphic:

forecastgraph11.jpg

The blue line represents the forecasts the model produces for each period, whereas the red line shows the actual Newspoll TPP vote for the ALP that the model is attempting to predict. The model misses the troughs and peaks of most of the big volatile movements in the ALP vote because the underlying dynamics of satisfaction ratings, primary vote level, and importantly, the slow moving ACNielson in the previous month don’t support the overactive Newspoll during these periods.

So how good is the model using previous elections?

In 1998, the model predicted an ALP TPP of 50.82 whereas the actual result was 50.91

In 2001 the model predicted an ALP TPP of 49.15 whereas the actual result was 49.07.

In 2004 the model predicted an ALP TPP of 47.23 whereas the actual result was 47.20.

It’s actually more accurate at election times than ordinary periods because those two little complicated variables were arrived at to simulate the processes involved in voters coming to a decision in the campaign. 2001 and 2004 were very different elections with movements going in opposite directions during the campaign period, but the model estimated both results fairly accurately by any measure.

I took the approach of rather than building some error correction components into the model for every period, it only needed to be done for specific periods when elections occur. And it’s probably also worth mentioning that the model predicts each months vote based on last months figures.

So what is the forecast for the election?

An ALP two party preferred result of 55.15%

How that will split between the States will come (hopefully) by Monday.

AddThis Social Bookmark Button

add to kwoff

Advertisements

Posted in Election Forecasting, Leading Indicators, Polling, Pseph, Voting behaviour | Tagged: , , | 91 Comments »

Nice Newspoll, Shame about the Rates.

Posted by Possum Comitatus on November 6, 2007

crikey1.jpg

This was me in Crikey today HERE. It’s free so you can have a squiz over there or keep reading.

When your luck is up, you just can’t take a trick.

Today we have the best Newspoll for the government since November, even if it’s only due to a bit of minor party preference noise, but as far as the image goes – a good poll is a good poll is a good poll.

Ordinarily, the popular media outlets would have a couple of stories splashed around hailing a comeback, the not so popular media outlets would probably be going hysterical and the nightly news would lead with the story; Laurie Oakes telling Nine viewers in tones of gravitas that the election is now a competition.

But this is no ordinary week.

Today’s papers are all about the ponies, tonight’s news will be about them as well, Wednesday’s papers will be all about the ponies and interest rates, Wednesday night’s news will probably be all about an interest rate rise, Thursdays papers will certainly be – and then the media attention and narrative turns to the first polls after the rate rise (assuming there is one).

Where does the best poll for the government in 12 months fit into that cycle? Well, it doesn’t. Not as far as normal people are concerned. But the particularly nasty piece of bad luck in this sequence of unfortunate events for the government is what is likely to come next.

The headline two-party preferred result of Newspoll has been bouncing around an awful lot lately – it’s almost become the great oscillator.

alptppchange1.jpg

This week’s Newspoll figures have the problem of slightly undervaluing the preference flows the ALP receives from the minor parties, meaning that it’s more likely than not that the next Newspoll will probably fix that up. These slight rounding problems and sampling volatility of the minor parties all come out in the wash over a few polls.

When you combine that with the ALP primary looking rock solid at 47/48, it’s almost expected that in the next poll or two, the two party preferred headline figure will show the ALP increasing its lead – simply as a result of the high ALP primary vote combining with this minor party sampling error and rounding issues.

But should that happen, the headlines will undoubtedly scream “Interest Rate Backlash!” as some new 55/45 poll shows the ALP gaining a two point lead from the previous poll, the best poll the government had enjoyed for 12 months, but one which no-one paid attention to because the ponies were on.

Somehow, the gods look to have conspired against the government to the point where their best Newspoll result in a year will likely be completely ignored in terms of any beneficial media coverage that matters, but instead will create the platform for a very dominant media narrative that launches against them if rates rise tomorrow and the next Newspoll moves toward the ALP simply as a result of polling noise. And with it goes the second last week of the campaign.

Howard must be pulling out with absolute frustration those few remaining hairs he has left on his noggin.

He just cannot take a trick.

AddThis Social Bookmark Button

Posted in Crikey, Polling | Tagged: , , | 19 Comments »

SA and the Census Data

Posted by Possum Comitatus on November 5, 2007

SA and the Census Data

Continuing on with our series of combining the Crosby Textor Oztrack 33 swings to the demographics of individual electorates by State, today we’ll have a look at little old South Australia.

So first up, we better have a look what Crosby Textor had to say about the swings in SA:

oztrack33sa.jpg

The big demographic swingers in South Oz apart from parents were part-time workers, the 18-24 age group and the 35-49 age group.

So let’s have a look at how these play out seat by seat, and we’ll use the following:

“18-24” is the number of 18-24 year olds in the seat as a proportion of all people in the electorate aged 18 and over.

“34-49” is the number of 34-49 year olds in the seat as a proportion of all people in the electorate aged 18 and over.

“PTW” is the number of part- time workers as are a proportion of all people in the electorate aged 18 and over.

“HA” is the percentage of the median household income that would be required to make the median home loan repayment for the electorate. This gives us a measure on housing affordability.

“LwUb” measures the number of Lower White/Upper Blue collar workers in the electorate as a proportion of all people in the electorate aged 18 and over. This is the demographic that is having the largest negative experience of Workchoices.

SLA1+ and SLA5+ are the proportions of the electorate that lived in a different statistical local area 1 year before the 2006 Census and 5 years before the 2006 Census. This gives us a bit of an idea of the overall population change of each electorate.

Again, we will just look at the Coalition held seats, and the data is based on the current electoral redistribution.

The current margins of the seat are:
Barker (19.9%), Boothby (5.4%), Grey (13.8%), Kingston (0.1%), Makin (0.9%), Mayo (13.6%), Sturt (6.8%), Wakefield (0.7%)

Seat 18-24 35-49 PTW HA LwUb SLA1+ SLA5+
Barker 9.77 29.54 18.44 27.21 24.49 7.11 20.47
Boothby 11.97 26.09 20.33 29.38 24.49 10.65 28.88
Grey 9.4 28.73 16.99 25.88 22.64 7.1 19.94
Kingston 12.49 29.38 20.26 27.76 31.84 9.88 27.07
Makin 12.47 29.14 19.36 25.46 33.47 8.97 26.08
Mayo 9.81 30.06 22.12 27.11 26.62 8.67 27.63
Sturt 12 26.48 18.76 30.07 24.57 10.79 28.59
Wakefield 12.7 30.3 17.02 29.22 26.53 10.24 28.6
SA Average 11.92 28.29 18.64 28.71 26.31 10.13 27.31

What strikes me about the Coalition held seats in SA is the remarkable blandness of them all. They have the same rough proportions for everything.

The young population is all within 3%, the 35-49’s all within 4%, Part time workers all within 5%, housing affordability all within 5%, Lower White/Upper Blue we finally have a bit of variation at 11% but this wasn’t designated as a key demographic in South Australia by CT. Everything seems to be all very uniform.

So let’s rank them now in terms of which seats would be more likely to swing based on known demographic issues. What we’ll do is take the difference between each measure and the State average. That way, those seats with a higher proportion of swinging groups will have a positive rating, and those with a smaller proportion of swinging groups will have a negative rating. Then we’ll add all the ratings (what are merely the individual differences from the State averages) together and see which seat would be expected to swing the most.

Seat PT diff HA diff 18-24 diff LwUb diff 35-49 diff Total
Kingston

1.62

-0.95

0.57

5.53

1.09

7.86

Makin

0.72

-3.25

0.55

7.16

0.85

6.03

Wakefield

-1.62

0.51

0.78

0.22

2.01

1.9

Mayo

3.48

-1.6

-2.11

0.31

1.77

1.85

Boothby

1.69

0.67

0.05

-1.82

-2.2

-1.61

Sturt

0.12

1.36

0.08

-1.74

-1.81

-1.99

Barker

-0.2

-1.5

-2.15

-1.82

1.25

-4.42

Grey

-1.65

-2.83

-2.52

-3.67

0.44

-10.23

This suggests that Kingston should swing the most based on demographics, while Grey should swing the least based on these demographics. However, this flies in the face of what we’ve heard about in places like Grey, where a large swing seems to on.

Unlike QLD where the swinging demographics of the CT research explained a lot of what we’ve been seeing and hearing about on the ground, in SA it explains either very little at all and local issues are a large player, or the swing in SA is very uniform.

If it’s uniform, and with a 12.4% swing on at the moment according to last big Newspoll, the member for Barker might be the only one breathing easy (although I’d imagine Dolly Downer is as safe as houses as well).

AddThis Social Bookmark Button

add to kwoff

Posted in Polling | Tagged: , , , , | 14 Comments »