Statistics as Principled Argument

In 1995, Yale University Professor Robert Abelson (1928-2005) wrote an interesting and engaging treatise on a topic that, on the face of it, seems obvious. He presents this in his text, “Statistics As Principled Argument”. The book is a quick and easy read and I would recommend it to anyone, whether or not they are into statistics. The concepts are presented in a way accessible to most readers.

The book begins, appropriately enough, with Abelson’s Laws which are:

Chance is lumpy.
Overconfidence abhors uncertainty.
Never flout a convention just once.
Don’t talk Greek if you don’t know the English translation.
If you have nothing to say, don’t say anything.
There is no free lunch.
You can’t see the dust if you don’t move the couch.
Criticism is the mother of methodology.

The meaning of each of the above becomes clear as the text proceeds through the 9 chapters that follow.

Abelson’s overall thesis is simple—statistics have little or no meaning if they are not part of a well-formulated and cogent argument. Data, manifesting as statistics of various kinds, have several properties that determine its persuasive force as the basis for an argument.

Abelson sums things up, as follows:

A research story can be interesting and theoretically coherent, but still not be persuasive—if the data provide only weak support for the rhetoric of the case. On the other hand, a lot of high-quality rhetoric can be squandered by a poor narrative—for example, if the research is so dull that no one cares which way the results come out. Thus rhetoric and narrative combine multiplicatively, as it were, in the service of persuasive arguments based on data analysis. If either component is weak, the product is weak. The argument is strong only when it has the MAGIC properties of forceful rhetoric and effective narrative. In making his or her best case, the investigator must combine the skills of an honest lawyer, a good detective, and a good storyteller.

The acronym MAGIC refers to as magnitude, articulation, generality, interestingness, and credibility. Without going into detail, these are defined as follows:

Magnitude – in essence, ‘how big is the effect’ with larger effects being more compelling than smaller ones.
Articulation – relates to how specific the evidence is where a precise statement is more compelling than an imprecise one.
Generality – conversely, how generally does the evidence apply? Claims that interest a more general audience tend to be more compelling.
Interestingness – for obvious reasons, interesting effects are those that “have the potential, through empirical analysis, to change what people believe about an important issue”. Clearly, more interesting effects will be more compelling than less interesting ones. At the same time, surprising or unexpected effects are also more compelling.
Credibility – Finally, credible claims are more compelling than incredible ones. The old saying that ‘an extraordinary claim requires extraordinary evidence’ comes to mind. The researcher must show that the claims made are credible. In particular, new evidence that contradicts some previously established belief will be less credible.

None of this should be surprising or unexpected, but Abelson put it together in a coherent package.

Beyond the above, he also explained various characteristics of data and statistics in his eight “rules”.

Chance is lumpy

Abelson’s observation here isn’t novel, but it’s entirely valid. This refers to the fact that ‘random’ processes or sequences often display non-random behavior in the short-term. This, of course, has implications for any testing protocol that looks at shorter-term behavior (see also the next point).

Abelson summarized the effect nicely:

“People generally fail to appreciate that occasional long runs of one or the other outcome are a natural feature of random sequences.”
Abelson, 1995, p. 21

Overconfidence abhors uncertainty.

Since most people have the expectation that chance is more regular than its actual lumpiness implies, a lot of people (including researchers) tend to underestimate the extent to which measurements can vary from one sample to another. They assign greater certitude to the results, rather than considering or acknowledging the prospect of chance being the cause of the observed effect.

“Psychologically, people are prone to prefer false certitude to the daunting recognition of chance variability.”
Abelson, 1995, p. 27

To address this, researchers need to 1) compute confidence intervals around estimated values, particularly when sample sizes are small, and 2) for studies aimed at making comparisons use a large enough sample size to differentiate the signal of a real difference from the noise of sampling error.

Never flout a convention just once.

The best example of this would be the adoption (in a study) of an unconventional test criterion — say, the use of p=0.1, instead of p=0.05. This might be done for any number of reasons but, for many people, it would be seen as flouting a convention (with the convention being to use p=0.05).

Abelson’s rule simply says stick with the convention throughout the study; do not switch between p=0.05 and p=0.10 (or whatever) just to get the findings you like and to reject those that you find inconvenient.

Don’t talk Greek if you don’t know the English translation.

This one is pretty obvious even in a non-statistical sense. If you don’t know what some phrase or means (in Greek or otherwise), then you shouldn’t be using that phrase. That’s particularly true when the phrase has some deeper meaning or interpretation.

From a statistical point-of-view, Abelson talks about this using MANOVA as an example saying,

“The output tables from a MANOVA are replete with omnibus tests, and unless the investigator is sophisticated enough to penetrate beyond this level, the results remain unarticulated blobs.”
Abelson, 1995, p. 128

The bottom line is simple — people should not use complex analytical methods (“talk Greek”) unless they know how to dig into the practical details (“know the English translation”). Otherwise, the results will be limited or meaningless. In reality, often one doesn’t need to “talk Greek” in the first place.

If you have nothing to say, don’t say anything.

This is a simple admonition to accept an outcome that goes nowhere. If a study produces no statistically significant outcomes (by chance, poor study design, or whatever),then there were no significant results.

When that happens, do not torture yourself or re-work the data trying to make something ‘meaningful’ out of it. Just move on to the next study.²

There is no free lunch.

Here Abelson was talking about generalizability of research findings.

“This is simply the way the research life is. One does not deserve a general result by wishing it.”
Abelson, 1995, p. 142

Basically, most studies are limited in what they tell us about the world. The results of any single experiment or study cannot (or should not) be generalized beyond the bounds of the specific context established by its sampling strategy, independent variables, and dependent measures.

Very often researchers overgeneralize their results giving others, e.g., the public, a distorted and incorrect interpretation.

You can’t see the dust if you don’t move the couch.

This rule also relates to generalizability, but it focuses specifically on generalizing from one context to another when you have a lot of potential variables to consider. The best way to figure out what a given variable does, is to control it (i.e., vary it or hold it constant). However, in any given study, it can be difficult to manipulate more than a few variables—moving the metaphorical couch to see what happens. As Abelson put it,

“The only sure way to have knowledge of a context variable is to vary it.”
Abelson, 1995, p. 155

Criticism is the mother of methodology.

Criticism is a fact of life. It is especially so in science where progress is made through improvements to study design and execution. Every study is subject to criticism—some valid and some not. But, if a plausible criticism arises, then researchers can plan another study to address that criticism. This is, in a very real sense, the essence of ongoing research.

Criticism of one study should lead to a new one that addresses any methodological concerns with the first, thus moving science forward.

“As research cumulates under pressure from the exchange of counterarguments, previous theoretical generalizations will be supported, modified, or abandoned, and new generalizations may emerge. … Thus, principled statistical argument is not only unavoidable, it is fundamental.”
Abelson, 1995, p. 198

Routledge link: Statistics As Principled Argument – 1st Edition – Robert P. Abelson – (routledge.com)

Google books: Statistics As Principled Argument – Google Books

Footnotes

The interpretation of such results may not be straight-forward. Sometimes those results, even though insignificant, may be useful and worthy of publication.
The interpretation of such results may not be straight-forward. Sometimes those results, even though insignificant, may be useful and worthy of publication.

R. B. Ostrum, FDE

Statistics as Principled Argument

Chance is lumpy

Overconfidence abhors uncertainty.

Never flout a convention just once.

Don’t talk Greek if you don’t know the English translation.

If you have nothing to say, don’t say anything.

There is no free lunch.

You can’t see the dust if you don’t move the couch.

Criticism is the mother of methodology.

Footnotes

Leave a Reply Cancel reply

Chance is lumpy

Overconfidence abhors uncertainty.

Never flout a convention just once.

Don’t talk Greek if you don’t know the English translation.

If you have nothing to say, don’t say anything.

There is no free lunch.

You can’t see the dust if you don’t move the couch.

Criticism is the mother of methodology.

Share this:

Footnotes

Leave a Reply Cancel reply