Stat 2000: Important Change in your course concerning Lesson 4 of my book

Published: Wed, 02/15/17

Please note that the midterm exam this term will cover Lessons 1 to 5 in my Basic Stats 2 book.

I am so happy to see that they have returned to the traditional approach to solving 2-Sample problems as discussed in Lesson 4 of my book. If you look at the formula sheet I give you on page 1 of my book, the Standard Error formula in the first line has an insanely complicated formula to compute the degrees of freedom.  I also show this same formula on page 191 at the bottom in Lesson 4.

The formula I show is for what I refer to as the generalized method. When you have a two-sample problem (two sample sizes n1 and n2, two sample means x-bar1 and x-bar2, and two sample standard deviations s1 and s2) you have first to decide should you assume the population variances are equal?

You use the Rule of Thumb to decide what assumption you will make. If the ratio of the two sample standard deviations is less than 2 (always dividing bigger over smaller), you should assume the two population variances are equal and use the pooled two-sample method. Everything is the same in your course as in my book for this method.  This is the formulas given in Line 2 of my formula sheet on page 1 of my book and also shown as #2 on your formula sheet this term.

When the Rule of Thumb is less than 2, we assume the variances are equal and use the pooled method. We compute the pooled sample variance and use that to compute the Standard Error of the difference between the two sample means. The pooled method has df = n1 + n2 - 2 as noted on your formula sheet in Line 2.

The change to the course this year (which is a return to the traditional methods) is that they are no longer interested in the generalized method (Line 1 on my formula sheet). That method has too complicated a formula for degrees of freedom. It is the true method when the population variances are not equal, and is the method that computer software would use, but it is too much work to do by hand. Even in the past few years when they did teach this approach, they knew it was too much work to do by hand, and students were never put in a position to use that df formula on an exam.

Instead, they are now using the conservative method. When the rule of thumb suggests it is inappropriate to assume the variances are equal (if the ratio of the sample standard deviations is bigger than 2), we will use the conservative approach.  We compute the Standard Error for the difference between the two means using the formula outline on Line 1 of your formula sheet this term.  This is exactly the same as the standard error formula I give you on Line 1 of my formula sheet.  However, the change is, rather than use that horrible formula to compute the best estimate of the degrees of freedom I give you in my version of Line 1 of the formula sheet, we merely use df = min{n1-1, n2-1).  Which is to say, the df = the smaller of n1-1 and n2-1.  You simply subtract 1 from each of the sample sizes and use the smaller answer as your degrees of freedom.

For example, if you have decided to use the conservative method and n1=10 and n2=15, then df= 9 since that is the smaller of 9 and 14 (n1-1 and n2-1). If n1=20 and n2=13, then df= 12 since that is the smaller of 19 and 12 (n1-1 and n2-1). If n1=19 and n2=19, df= 18 since both n1 and n2 are the same.

This is usually giving us a degree of freedom that is a little smaller than the truth, but the payoff is it takes mere seconds to establish the degrees of freedom.  This is called the conservative method because we are playing it safe. We don't know the true degrees of freedom because we have avoided using the insane df formula, but have used a slightly lower degree of freedom instead.  Using a lower degree of freedom is favouring the null hypothesis, making it even more difficult to reject Ho.  And that is always our philosophy in statistics: assume the null hypothesis is true, and only reject it if there is significant evidence that the alternative is true.
Changes to make in Lesson 4 of my book
I start illustrating the two-sample method as of question 6 in Lesson 4.
  • Question 6 is unchanged since that is a pooled method.
  • Question 7 is NOT a pooled method, so we would now use the conservative method.
    • The df is the smaller of 24-1 and 24-1. Since both are the same df= 23.
    • Since 0.05 is in the upper tail and df=23, the critical value is now t*= 1.714.
    • The formula for the Standard Error and test statistic is still the same, so t= 9.2952.
    • The decision is still the same.
    • The P-value is still the same even though the df is changed because 9.2952 is still off the end of the bell curve.
    • The confidence interval has changed since we are now using df= 23 and t*= 2.069 for 95% confidence.  The margin of error is now 1.3355 and the confidence interval is now (4.6645, 7.3355).
  • Question 8 is unchanged since that is a pooled method.
  • Question 9 is unchanged.  The question does call for the conservative method, but the standard error formula is the same in the conservative and generalized methods. It is just the degrees of freedom that differs.
  • Question 10 is unchanged since that is a matched pairs method.
  • Question 11 is unchanged since that is a pooled method.
  • Question 12 is unchanged since that is a generalized method. If we were doing this question by hand, we would use the conservative method and the df= 20 (since both n1 and n2 are 21), but computer software can give us a more accurate df using the more complex formula that you no longer care about. They will avoid this can of worms by never giving you a computer printout for this kind of question. 
  • Question 13 is unchanged since that is a pooled method.
  • Question 14 is unchanged since that is a pooled method.