From Davis Balestracci -- Some Final Thoughts on DOE...for Everyone

Published: Mon, 06/20/16

From Davis Balestracci -- Some Final Thoughts on DOE...for Everyone

It depends!

 
Hi, Folks,

Client A came to me for a consultation and told me up front his manager would allow him to run only 12 experiments.  I asked for his objective.  When I informed him that it would take over 300 experiments to test his objective, he replied, “All right, I’ll run 20.”  Sigh... 

No, he needs to either redefine his objectives or not run the experiment at all. I never saw him again.

Client B came to me with what he felt was a clearly defined objective. He thought he just needed a 10-mintue consult for a design template recommendation.  It actually took three consults with me totaling 2-1/2 hours because I asked similar questions to those for planning last newsletter's experiment

During the first two consults, he would often say, "Oh...I didn't think of that. I'll need to check it out." He eventually ran the experiment, came to me with the data, and asked, "Could you have the analysis next week?"  I asked him to sit down and was able to finish the analysis (including contour plots) in about 20 minutes

It's all in the planning.

To review:  If your objective is to establish effects, any good design needs answers to three questions as part of the planning:

  • What risk are you willing to take in declaring an effect significant when it isn't?  (Usually, 5%)

  • What is the threshold minimum difference you must detect to take the action you want?

  • If this difference exists, how badly do you want to detect it?

If you don't formally consider these questions, your design will answer them by default.  As you've seen in the past few newsletters, this can result in some eye-popping sample sizes.
 

Power:  the probability of successful detection of your desired difference if it exists

 
For any previously calculated sample sizes, I assumed the usual desired significance level of 0.05 for testing the effects. Many people blindly use this as their only statistical criterion without formally asking the other questions. They naively run a design and t-test the results to declare them significant or not;  but, they have no idea what minimum effect their design was implicitly designed to detect!

So how did I obtain those sample sizes (formula below)?  Once again using the tar scenario hypothetical client conversation, I made an assumption I didn't tell you about that answers the question, "How badly do I want to detect my desired effect?"  If you wanted to detect a one percent difference, it will take 680 experiments to have a 90 percent chance of detecting it if it exists.  The more relaxed sample size of 500 gives you an 80 percent chance of detecting your desired difference.  The same is true for the 170 and 130 experiments required, respectively, to detect a two percent difference. 

This concept is called power -- a design's ability to detect your desired threshold difference if it exists. Or as I heard a statistics professor say tongue-in-cheek to a PhD graduate student planning an overly ambitious experiment using an inappropriately small sample size:  Power is the probability you will get a thesis. Things subsequently simplified quite a bit!

Let's look at some designs people might use, naively expecting to detect a one percent difference in the tar scenario:

  • Running 4 experiments would have a 5.5 percent chance of detecting the one percent difference if it existed

  • Running 8 experiments would have a 6.5 percent chance

  • Running 16 experiments would have a 7.5 percent chance

As you see, one can work backwards to obtain the power of a design. It is not unusual for clients to be very surprised (and disappointed) at the answers.

In consulting with the client who could initially afford only 12 experiments but agreed to run 20. my sample size of 300 experiments resulted from his original objective of wanting to detect a minimum effect of approximately (0.325 x SDprocess) with 90 percent power.

(SDprocess = standard deviation of process being studied, of which he had a historical estimate)

With his proposed 20 experiments, he would be able to detect a minimum effect of approximately (1.6 x SDprocess); or, to turn things around, the power to detect his desired threshold difference of (0.325 x SDprocess) would be approximately 9.3 percent.


If you have a relatively good estimate of your process standard deviation (SDprocess):

  •   A  2 x 2 unreplicated factorial design can detect a ((2.8 to 3.3) x SDprocess) difference  (80% and 90% power, respectively)

  • A  2 x 2 x 2 unreplicated factorial can detect a ((2.0 to 2.3) x SDprocess) difference

  • 16 experiments can detect a ((1.4 to 1.6) x SDprocess) difference

For those of you interested in the sample size calculation, let R = ratio of (desired effect / SDprocess):

For 80% power:  N total = (5.6 / R)**2       [(5.6 divided by your ratio R) and this result is squared]

For 90% power:  N total = (6.5 / R)**2

My summary of Some Pretty Good DOE Rules

 
From my experience with factorial designs:

  • 16 experiments is a "pretty good" (and relatively affordable) number. I rarely ran 32.

  • If you're going to run 16 experiments, you may as well study 5 variables if you can

  • If you have only three variables, think about a Box-Behnken design if appropriate
    (it would not be appropriate for the factorial example in my last newsletter because all three variables were "Yes or No.")


  • If you have more than five variables and can afford only 16 experiments:  if you have six variables, you may as well study eight.

[For more information, see R. M. Moen, T. W. Nolan and L. P. Provost, Quality Improvement Through Planned Experimentation]

This strategy also allows one to:

  • screen out variables that don't seem to be important, which often gets the reaction, "Wait a minute -- I know that variable is important!"

    • A variable may indeed be important, but in this. case, all "insignificant" means is it doesn't exert an effect or interaction in the specific range studied -- which was chosen for a reason (objective). 

    • Set the insignificant variable outside the studied range and don't be surprised at what happens!

  • make important decisions regarding non-continuous discrete variables before continuing to get a contour plot (e.g., (catalyst A vs. catalyst B))
     
Since last newsletter's design was all discrete variables, (e.g., Yes or No:  Did the patient keep a food diary?)). this contour plot option isn't applicable.

When variables are screened out, a design can usually be easily augmented to get the remaining variables' interactions and, if a contour plot is desired, augmented even further to yield the quadratic equation to plot.  Notice:  this strategy is sequential and many times builds upon experiments already run. No data are wasted.

To summarize, the act of designing an experiment is composed of four parts:

  • Deciding what you need to find out or demonstrate,

  • Estimating the amount of data required,

  • Anticipating what the resulting data will be like, and

  • Anticipating what you will actually do with the finished data."

Hendrix has are more ways to "mess up," but they relate to using regression analysis on planned data...or unplanned data.  Regression is one of the most abused statistical tools, and I'll talk about that in a future newsletter. 

Thanks for indulging me in this DOE tangent of the last six newsletters.  I hope you learned a few things, especially statistical trainers and belts.

Back to improvement next time.

Because of the July 4th U.S. holiday coming up, I am going to skip the next newsletter cycle and, depending on how my summer is going, probably be back on 18 July.. 

Kind regards,
Davis
 
===================================================================================
The philosophy of Data Sanity will certainly help you improve the quality of your PLANs to test any theories -- or it could even show that you don't need to run an experiment at all!
===================================================================================
Data Sanity: A Quantum Leap to Unprecedented Results is a unique book that synthesizes the sane use of data, culture change, and leadership principles to create a road map for excellence.

Click here for ordering information [Note:  an e-edition is available] or here for a copy of its Preface and chapter summaries (fill out the form on the page).

[UK and other international readers who want a hard copy:  ordering through U.S. Amazon is your best bet]

Listen to a 10-minute podcast or watch a 10-minute video interview at the bottom of my home page where I talk about data sanity: www.davisdatasanity.com .

Please know that I always have time for you and am never too busy.to answer a question, discuss opportunities for a leadership or staff retreat, webinar, mentoring, or public speaking --  or just about any other reason!  Don't give it a second thought to e-mail or phone me.

=========================================================
Was this forwarded to you?  Would you like to sign up?
=========================================================
If so, please visit my web site -- www.davisdatasanity.com -- and fill out the box in the left margin on the home page, then click on the link in the confirmation e-mail you will immediately receive.