From Davis Balestracci -- "Every theory is correct in its own world..."

Published: Mon, 02/16/15

From Davis Balestracci -- "Every theory is correct in its own world..."
[~1000 words:  take 4 to 6 minutes to read over a break or lunch]

"...but the problem is that the theory may not make contact with this world." -- W. Edwards Deming

Hi, Folks,
As statistical methods become more and more embedded in everyday organizational quality improvements, I find that a key concept is often woefully misunderstood, if it is even taught at all.  Deming distinguished between two types of statistical study, which he called "enumerative" and "analytic."

The key connection for quality improvement is  that statistics relates to reality and lays the foundation for a theory of using statistics.  Whether you realize it or not, the perspective from which virtually all college courses are taught is population based, a.k.a. enumerative.  In a real world environment, this becomes questionable at best because everyday processes are usually not static populations.  The question becomes: what other knowledge beyond probability theory is needed to form a basis for action in the real world?

Think of population-based statistics as studying a static pond and a designed study going even further to create a custom-made pond like a swimming pool – a sanitized version of a pond, much easier to study and sample because of reduction of “nuisance”(everyday) variation.  

Beyond design of the actual study circumstances, the statistical data processes now come into play:  (1) measurement definition, (2) appropriate data collection so that (3) any statistical analysis is appropriate,  and (4) correct interpretation of the analysis results. 

In a research study, the variation of each of these processes should be (and usually are) tightly controlled to make the application of enumerative methods valid. But what about the application of the study's result?

Ignore variation...or study it?

Inevitably, as results from a study are applied, this

has now become:

Not only that, but uncontrolled variation manifests in the four statistical data processes as well:

What was easy in a “swimming pool” environment now becomes much more complicated -- the real world is more like a whitewater rapids.  "Random sample" has an entirely different meaning in a minimally controlled semi-chaotic environment.   A good example is an everyday medical environment with patients flowing in and out.  You cannot take repeated samples from the exact the same population, except in rare cases. Analytic statistics are very concerned with where and how one should sample.

For example, we may take a group of patients who attend a particular clinic and suffer from arthritis.  But the resulting sample is not necessarily a random sample of the patients who will be treated in the future at that same clinic. Still less is it a random sample of the patients who will be treated in any other clinic. In fact the patients who will be treated in the future will depend on choices that we and others have not yet made. And those choices will depend on the results of any study we are doing, and on studies by other people that may be carried out in the future.

And there is an additional issue of how the impact of variation in this particular environment on a theoretical result compares to what could happen in yet another environment (below), i.e., benchmarking:

The late David Kerridge, one of the world's leading Deming thinkers, wrote:

"Suppose that we compare two antibiotics in the treatment of some infection. We conclude that one did better in our tests. How does that help us? 

"Suppose that all our testing was done in one hospital in New York in 1993. But we may want to use the antibiotic in Africa in 1997. It is quite possible that the best antibiotic in New York is not the same as the best in a refugee camp in Zaire. In New York the strains of bacteria may be different: and the problems of transport and storage really are different. If the antibiotic is freshly made and stored in efficient refrigerators, it may be excellent. It may not work at all if transported to a camp with poor storage facilities. 

"And even if the same antibiotic works in both places, how long will it go on working? This will depend on how carefully it is used, and how quickly resistant strains of bacteria build up.

"This may seem an extreme case, and it is. But in every application of statistics we have to decide how far we can trust results obtained at one time and under one set of circumstances as a guide to what will happen at some other time under new circumstances."

Traditional statistical methods IGNORE everyday variation

Statistical theory, as it is stated in most textbooks (enumerative), simply analyzes what would happen if we took repeated, strictly random samples, from the same population, under circumstances in which nothing changes with time.  Enumerative analyses or studies either naively assume no possible influence of outside variation or have the luxury of tightly controlling it as part of a study's design.  Unfortunately the potential influence of outside variation continues to be ignored even after the study when it’s time to actually apply the result.  Analytic statistics' purpose is to anticipate and formally study the manifestations of such outside, uncontrolled variation.

The analytic statistical approach to get more information inherently improves the situation because when you understand -- rather than ignore -- the sources of uncertainty, you understand how to reduce it.  As I hope you now realize, analytic statistics are totally different from the clinical trial mindset in which most physicians have been taught, in which "tight control" is an understatement.

A good example of this stark contrast can be demonstrated in the case of hospital acquired infections.  Let's say that a statistically significant result from a well-designed enumerative study  has been found to eliminate them, i.e., apply the result and you shouldn't have them.  With enumerative thinking, the post-application tendency would then be to treat any occurrence of an infection (undesirable variation) as a special cause. This is why people are drowning in root cause analyses.  This would be helpful in the case of an outbreak.

But, in terms of everyday work, one usually has to take the view that the environment might be "perfectly designed" to have such infections, in which case a common cause strategy would be warranted.  It is only by "plotting the dots" – the basis of analytic statistics – that you will be able to distinguish between the two.

Until next time...

Kind regards,

P.S. Feel like you're waiting for Godot?
I do!  Is the Pareto principle at work -- the last 20 percent of a book's logistics / minutiae taking 80 percent of the time?  It certainly feels like it -- and I'm just as frustrated as many of you!  My publisher recently blindsided me with a request for some major last-minute minutiae, which was promptly taken care of.  So, "When, Davis, when?"   My guess is 4 to 6 more weeks.

Hang in there and pre-order the new edition of Data SanityClick here .

If you'd like more information, please contact me to ask for the full Preface, Introduction, and chapter summaries of this revised edition.  As always, I welcome contact from my readers for just about any other reason as well.  I love corresponding and answering questions. ( )

And do please keep me in mind if you need a plenary speaker for an internal or professional conference, a leadership retreat using the 10 scenarios from Chapter 2 of Data Sanity and leadership skills of Chapters 3 and 4,  a staff retreat to get you "unstuck" in current improvement efforts, or some mentoring to help you "quantum leap" to a new level of eye-opening results.

Was this forwarded to you?  Would you like to sign up?
If so, please visit my web site -- -- and fill out the box in the left margin on the home page, then click on the link in the confirmation e-mail you will immediately receive.
Want a concise summary of Data my own words?
Listen to my 10-minute podcast. Go to the bottom left of this .