[Vision2020] Anatomy of a Polling Disaster

Thu Oct 30 11:22:15 PDT 2008

I found the following, as well as the linked WSJ article, fascinating reads,
and I think others might find them interesting as well.

Since I know some folks seem to be having trouble with HTML formatted posts,
I'm sending this in plain text.  Please be aware that there are great links
in both articles you can access by visiting the links below.

First, here's the link to the WSJ Numbers Guy's article:
http://blogs.wsj.com/numbersguy/mccain-and-the-youngest-voters-441/

Here's the second:
http://www.fivethirtyeight.com/2008/10/anatomy-of-polling-disaster.html

Thursday, October 30, 2008
Anatomy of a Polling Disaster 

Carl Bialik of the Wall Street Journal has the scoop on the bizarre
internals in that IBD/TIPP poll, which as we noted last week, found John
McCain as having a substantial lead among young voters. I speculated that
this result could only be possible if IBD/TIPP were radically undersampling
young voters, and indeed that seems to have been the case:

"Raghavan Mayur, president of TechnoMetrica, told me he was equally
surprised by the results, saying the widespread perception that Obama is
leading by a large margin in that group "is my perception, too." He blamed
the result on a small sample size. Each daily tracking poll includes about
1,000 interviews spread over the prior five days; each day a new set of
survey respondents is added and the oldest set is discarded.

Ideally, Mayur would like to have 75 of all those respondents fall into the
youngest age range. Some pollsters would have preferred more; this age group
makes up 13% of the adult population, though its voting rate historically
has been lower than average. His sample fell far short even of his lower
goal, typically including just 25 to 30 respondents from age 18 to 24 -
meaning just five or six new interviews with these young voters were being
conducted each day. "We are not able to get to speak to as many as we would
like to in that group," he said.

He blamed that on several factors. For one thing, nearly one-third of adults
in that age range lack landline phones, and Mayur's pollsters don't dial
cellphones. (He points out that when calling cellphones, the chance that the
person who picks up lacks a landline and is in the relevant age range is
quite low.) Furthermore, among those who do live in households with
landlines, young people may be away at school or in the military, Mayur
said.

This small sample size at first didn't trouble Mayur, as Obama led among
these voters in the first three tracking polls. But when the results started
to break McCain's way as suddenly and dramatically as they did, Mayur began
to question his own methodology. On the day McCain's lead widened in this
group to 52 points, Mayur added a footnote to the 18-to-24-year-old group:
"Age 18-24 has much fluctuation due to small sample size." He says he didn't
add a similar one to the Jewish subgroup, with just half the sample size as
the young voters, because the Jews in his sample consistently stated a
preference for Obama, as he expected."

Now, read those paragraphs carefully, because there are several problems
with this pollster's process:

1. 75 young voters out of a 1,000-person panel is an awfully low target. In
2004, about 93 out of every 1,000 voters were age 18-24, according to
statistics compiled by the Census Bureau. Now, we can debate about whether
that number is going to go up this year (youth turnout increased by about 50
percent as a share of the Democratic primary electorate), but it's certainly
not going to go down.

2. That notwithstanding, their target may be a moot point, it doesn't appear
that the pollster felt any compulsion to weight for age-based demographics
in the first place. "This small sample size at first didn't trouble Mayur"
... well, it ought to have troubled him, because if only 3 percent of your
sample consists of 18-24 year-olds, when that fraction should be closer to 9
percent even assuming no increase in youth voter support, you're going to
significantly understate Barack Obama's margins. A superior pollster would
have flagged this problem long before it became manifest to the entire
world.

3. Lastly, Mayur appears to have "resolved" the problem by relying on
non-random sampling techniques. Now, I don't want to be too critical of this
decision, because Mayur has been kind enough to disclose his process. I'm
sure that he isn't the first pollster to take a few shortcuts in the process
of creating his sausage, and I'm sure that he won't be the last. Still, this
would seem to violate one of the most basic premises of survey research,
which is that of the random sample ... in "resolving" his young votes
problem, Mayur could very easily have introduced a whole host of others, the
effects of which may be harder to detect.

The point of all of this is that just because some pollster puts some
numbers together in a PDF doesn't mean that they have any particular idea
what they're doing. This pollster apparently made its name for itself
because they had forecasted the results of the 2004 popular vote accurately.
Notwithstanding that one result isn't anywhere near enough information to
conclude that a pollster is strong, the 2004 election was perhaps the
easiest one in history to forecast. The electorate was highly partisanized
with few undecideds, and both bases turned out in roughly equal numbers; it
wasn't just IBD/TIPP that got it right -- almost everyone did. This election
is considerably more difficult to poll, and it's exposing the weaker
pollsters.

-- Nate Silver at 12:58 PM