Saturday, February 02, 2008

Margins of Error in Iraq

Let me start by saying that finding out what is actually happening in Iraq is very difficult, and that actually gathering that information requires admirable courage.

That said, I think that those that try and make sense of what’s going on there need to display at least a minimal level of introspection, humility, and recognition of how limited our information is.

In September 2007, the British polling survey firm ORB issued a survey from Iraq that suggested that more than a million Iraqi citizens had died violently since the invasion. The ORB website used a less neutral term: “murdered”.

ORB’s core competency seems to be the familiar western opinion survey by random phone interview. That isn’t actually very relevant to doing a cluster sample mortality survey in a war zone. And the survey work itself was done by an Iraqi firm, IIACSS, that didn’t exist before 2003, founded by an Iraqi with apparently only limited formal training in survey methodology.

This may not be as much of an issue in surveying public opinion. Similar opinions may be broadly spread through either the Iraqi population as a whole or regional subsets. If so, exactly where you sample may not affect results too seriously.

Violent mortality, on the other hand, can be lumpy. It’s pretty clear that some parts of Iraq are a lot more violent than others, and even when you get down to the local neighborhood some blocks or even houses can be can be lot luckier than others, or the reverse. Mortality surveys are very sensitive to even small sampling errors.

ORB doesn’t seem to have provided a lot of added value in terms of oversight and quality control. They originally reported that the survey was based on “a nationally representative sample.” Later, they admitted that the original survey was “undertaken in primarily urban locations” Given that about a third of Iraqis live in rural areas, this is a significant omission, and ORB failed to disclose the choice when the results were first published. They should have. When they did a follow up survey and sampled more rural areas, they reduced their initial estimate of violent deaths by about 200,000.

Their latest press release on the study indicates that it covered "112 unique sampling points". That is, it was a cluster sample of the sort used by other researchers in Iraq. It wasn’t the 2,163 independent observations you’d get if you did that number of random phone interviews. ORB calculated their margin of error as though it was that number of independent observations, not the much larger margin of error for a cluster sample of that number of households. This not only grossly understates the level of uncertainty in the estimate, but makes you wonder how well they understand this sort of work.

And even when theoretical sampling error is correctly calculated, that doesn’t include other sources of uncertainty. Researchers make subjective decisions on when to skip areas because of security or other issues, and when to do follow up interviews of clusters missed for these or other reasons, violating a truly random sample. Survey teams may curbstone, or invent responses, particularly when the risks of carrying out the survey are as real as they are in Iraq. There may be errors in tabulating the data.

The survey was originally published with a glaring error in Baghdad’s religious composition undetected. This does not speak well for ORB’s diligence in checking for possible error, bias or fraud.

Collecting the number, age and sex of household members is a powerful tool for checking the plausibility of the sample in a mortality survey. It is unfortunate that neither the ORB survey nor Burnham et al did so, since the absence of this data limits the credibility of the information gathered at considerable personal risk by the survey teams.

Follow up visits by supervisors is another powerful check of survey accuracy. I think it’s important for those publishing survey results in Iraq to disclose if this was done, as well as the number of sampling points and what factors, if any, were used in weighting the raw data.

No comments: