News headlines are filled with statistics: “Study shows 50 percent of marriages end in divorce” … “Research reveals women are better drivers than men” … “Findings show that intelligent people are more attractive.”

The scientific mindset has so captured our society that one can often prove a point simply by saying, “the research says.” But how reliable are statistics?

That depends on many factors. Staticians know that using statistics to prove a point is fraught with difficulties. The reason for this is that newspapers, and other sources, are not reporting objective data, but the researchers’ subjective interpretations of the data. Because of the element of subjectivity, another researcher might look at the same data and draw a different conclusion.

A case in point: In the last presidential election, some prognosticators — using early exit polls — predicted a landslide victory for John Kerry. Other observers saw the same data and recognized that it was inconsistent with other polling data. So, they were much more cautious in predicting a Kerry win.

The election example illustrates a key point in survey research: the people surveyed must be drawn from a “random sample” to get a true picture. Apparently, the early exit polls were not based on random samples, but on samples consisting of, mostly, John Kerry supporters.

Here’s another example of a time when a non-random sample was polled, resulting in a misleading statistic. In 1976, Shere Hite published a book on female sexuality where she reported that 84 percent of American women were unhappy with their relationships with men. But the statistic was meaningless given the sample Hite used. She sent out 100,000 surveys that invited women to fill out a lengthy questionnaire. Only 4.5 percent of the women returned the survey. This led other researchers to suspect that her sample was not random, but was overweighted with unhappy women who had the time and energy to complete the survey. Anytime participants are allowed to “self-select” to participate in a survey — as they did with this survey — the sample loses credibility.

Errors can creep into statistics in other ways. Sometimes a statistic combines data from unrelated groups that lead to a misleading generalization. For example, some years ago a researcher reported that children whose mothers were employed had higher IQs than children whose mothers stayed at home. But, upon closer scrutiny, the statistic didn’t hold up. When the data were broken down by marital status, it was revealed that for intact homes there was no difference in IQ between children whose mothers were employed and those who stayed at home. The difference showed up in the divorced sample: children with working mothers had higher IQs than children whose moms stayed at home.

Here’s another example of a misleading conclusion. In recent years, some investigators have reported negative effects on children who are often spanked and called for a ban on spanking. But other researchers, like Bob Larzelere, have questioned these studies. Larzelere reports that when studying the results from other forms of discipline — like the use of “timeouts”— equally negative effects are found. So, the problem doesn’t seem to be with the spanking (nonabusive), but something else.

Another important factor in evaluating statistics is the “effect size.” In laymen’s language, this simply means “how much of a difference does the issue really make?” Let’s go back to the example of working mothers and their children’s IQs. The mothers who were employed (both married and unmarried) had children with an average IQ of 108.2. The mothers who were unemployed (both married and unmarried) had children with an average IQ of 104.6. The difference in IQs was 3.6. While the children of employed mothers did have a higher average IQ, it wasn’t much higher. In fact, when it comes to IQs, a difference of 3.6 would seem rather trivial — definitely not enough to draw any firm conclusions.

As seen in these examples, we need to dig further when reading an article saying, “The research says…” Given the power of research to influence public opinion, it behooves us to approach such reports with a degree of healthy skepticism.