They come from answering three questions. The first is answered pretty much by default: What risk are you willing to take for declaring an effect significant when it
isn't? Usually, 5 percent.
(2) and (3) are the others. Initially these concepts can be very confusing (at least they were for me!), but I hope it will become clearer when I address this further next time. Answering these three questions determines (4).
Actually, answering any three of the questions above automatically answers the fourth.
If you have a specific
sample size in mind and plan on using p = 0.05 for significance (2 questions answered), you can work backwards to calculate various combinations of the other two, either:
- what effect you can reasonably detect (2), given your specific answer to (3) (desired probability to detect it) or
- the probability of detecting a desired effect (3), given your specific answer to (2) (desired effect to
detect).
Important point: If all you do is default to p = 0.05 to detect effects, your design will, by default, answer questions (2) to (4) (as in the calculated sample sizes above) -- which could possibly waste your hard work unless you reconsider your objectives.
And for those of you who are wedded to the "rapid cycle PDSA" methodology: have I brought up some things that might
need consideration during your PLANning? (Hint: Yes).
Here is an additional mess-up, which, unlike Mess-up #1 above, isn't necessarily guaranteed -- if you carefully PLAN:
Balestracci's Mess-up #2: Vague planning on a proposed vague solution to a vague problem usually results in vague data, on which vague analyses are performed -- yielding vague results.
Many times, I see data collection
addressed as an afterthought, usually ad hoc. Collecting poorly designed data (or not even collecting data at all!) virtually guarantees non-trivial human variation seeping in -- an open door for introducing the toxic and very human "constant repetition of anecdotal perceptions."
(CRA...)
More discussion of sample size next time.
Kind regards,