From Davis Balestracci -- A Final Farewell to the 2015 Baseball Season

Published: Mon, 11/30/15

From Davis Balestracci --
A Final Farewell to the 2015 Baseball Season
"His enthusiasm is to be commended and his knowledge is outstanding!"  "I can't wait to get back to my practice and begin using processoriented thinking.  For me, it makes so much more sense than six sigma."  "I hope to see Mr. Balestracci at future conferences. This was very helpful information."                

– Feedback from an audience of CEOs, CFOs, and CMOs attending a recent all-day seminar

[~1350 words, but designed to be thoughtfully entertaining...perhaps even a bit challenging -- take 6 to 10 minutes to read over a break or lunch]

The Baseball World -- the epitome of explaining anything as special cause

Hi, Folks,
Today, I’m going to apply some simple statistical thinking to baseball (my favorite sport), similar to what I did in my 14 April newsletter (and my 9 June and 23 June newsletters applying it to golf).

I want anyone to be able to enjoy this, so I’ll mark any technical statistical details as optional reading.  For those of you interested only in the interpretations,  I'll have the “bottom line” conclusions, many of which I think will surprise you. 

Important point:  maybe you can’t do the math and don’t even want to do the math, but you need to realize the vital necessity of understanding these types of analyses and having access to someone who can do them.  In many similar daily situations you encounter, anything else would be data INsanity.

With this innate tendency of baseball to explain common cause as special, opportunistic data torturing abounds.  The following article on my favorite team, the Boston Red Sox, from the 6 November Boston Globe had just one too many red flags:

I smiled as I read it and couldn’t resist the urge to dig deeper to find the data to test his statements.  There is a wonderful site with just about any baseball statistic you could find if you searched long enough.  I went to its section on bullpens, which had three pages worth of stats.

I figured that if they went through the trouble to compile them, they must somehow contribute to bullpen performance.

1.    “…After all, the Sox bullpen was by many measures the worst in the majors last year, with the absence of Koji Uehara and late struggles of Junichi Tazawa leaving the team bereft of the sort of strikeout-per-inning arms that have become a staple of the game.”

I took my best guess and used 9 of the compiled stats:  (1) ERA (lower is better), (2) blown save percentage (lower is better), (3) walks per 9 innings (minus intentional walks – lower is better), (4) strikeouts per nine innings (higher is better), (5) batting average against (lower is better), (6) something called OPS (sum of on base percentage and slugging percentage) (lower is better), (7) home runs given up per 9 innings (lower is better), (8) steal success rate (lower is better), and (9) pitches per plate appearance (lower is better).

I got the 30-team rankings for each measure – 1 to 30 (best to worst) – and summed them for an analysis.  Any score between 9 to 270 (lower is better) is possible, with the calculated average being (9 x 15.5) = 139.5.  But one needs to determine how much common cause there is around that average.

Optional technicalities:  I did a nonparametric analysis of variance on the individual rankings, from which I obtained the variation to an analysis of means (ANOM) on the sum of the 9 ranks for each team.  Here is the result (explanation below – please take the time to understand it and its implications):

Very important for everyone:  This ANOM type of analysis to expose special causes is woefully underutilized in most improvement work.  Everyone needs to have a basic grasp of it.  W. Edwards Deming often used this technique, invented by Ellis Ott, and was emphatic that any points between the two common cause limits (in this case, 71.7 and 207.3) could not be ranked.  This is a concept that initially is very difficult to wrap one's arms around (including me in the early '90s!).  The 25 teams (or 26 depending on how one interprets Baltimore (#3)) between those two limits are indistinguishable from each other…and the overall average.

Some might think that Boston (#4 on horizontal axis) is below average because its rank sum score (197) is greater than the average of 139.5.  However, to be truly below average requires a score greater than 207.  Boston is not a special cause -- based on on this snapshot of data.

Bottom line:  
  • “Below average” bullpens:  Atlanta (#2), Colorado (#9), Detroit (#10)
  • “Above average” bullpens:  Pittsburgh (#22) and maybe Baltimore (#3)
As Deming would say, these 30 teams form a “system,” and individual teams are either inside the system (common cause) or outside the system (special cause in either direction).

And then there's the other half of the quote above trying to explain an alleged difference in strikeout rates during the absences of Uehara and Tazawa.  

About those strikeout rates -- p-chart ANOM

Optional technicalities:  The two graphs below are a p-chart ANOM (p=proportion / percentage) of bullpen performances.  This first  compares the Boston bullpen's 2014 and 2015 strikeout rates (Strikeouts / Total outs).  I put in even less conservative criteria (5% and 1% significance limits) as well as the standard “3.” 

Bottom line:  No difference in Boston bullpen's 2014 and 2015 strikeout rates (the data lie between even the narrowest decision limits).

This second graph compares major league baseball's bullpens' total strikeout rates for 2014 and 2015 by combining the rates of all 30 bullpens (same criteria as above):

Bottom line:  similarly, the 2014 and 2015 data both lie within the narrowest limits – no year-to-year difference.

I wouldn’t be a bit surprised if someone has said,  “The Red Sox followed the overall trend of the major league bullpen strikeout rate for the 2015 season by being down slightly from 2014.”  [Sorry, just not true]

Using the same p-chart ANOM technique, how does Boston compare to the other 29 bullpens in terms of its individual strikeout rate for each year? 

Bottom line: 
  • Boston (#4) is between the limits both years, so it was average for both seasons – no difference.
  • I didn’t realize how strikeout dominant the NY Yankees (#19) were in both in 2014 and 2015.  
  • Detroit (#10) and Minnesota (#17) were truly below average in 2015 (Minnesota in 2014 as well).

2. “Red Sox relievers finished with a 4.24 ERA last year. The league average bullpen had a 3.71 mark.  What are the odds of bridging that divide of 0.53 earned runs per nine innings? Excellent, actually.”

I needed to get an estimate of the standard deviation to be able to perform an ANOM on the 2015 individual team bullpen ERAs.

Optional technicalities:  I analyzed ERA using the variation from the combined 2014 / 2015 data.  An initial analysis of variance (ANOVA) showed no difference either by year or league.  It also identified five outliers in terms of the difference between 2014 and 2015.

Bottom line:  Any difference between 2014 and 2015 that is greater than ~1.1 is considered significant

This occurred for five bullpens:

                       2014        2015          Diff    

Atlanta            3.31        4.69        +1.38                
Houston         4.80        3.27         -1.53                
Oakland         2.91        4.63        +1.72                
San Diego     2.73        4.02        +1.29
Seattle            2.59        4.15        +1.56                

[Boston          3.33         4.21        +0.91 -- common cause]

Optional technicalities (because I like to come at data from several angles to see whether conclusions converge):
  • Since the initial ANOVA showed no difference by either year or league, I also did a simpler, statistical process control type analysis using only the individual year-to-year ranges of each team (absolute value of the individual team 2014-2015 difference). 
  • Using both the median and average of these ranges to detect outliers until they pretty much concurred, the conclusions matched those of the more formal ANOVA both in terms of outlier criteria (difference > 1.1)  and approximate standard deviation (~0.29).
  • I also used a nonparametric boxplot analysis of the actual individual team year-to-year differences (not absolute value) and it determined that an outlying range was greater than ~1.1 – 1.2 .
Bottom line:  three different techniques concluded (1) a difference greater than ~1.1 was significant and (2) a good enough estimate of the standard deviation is 0.29.

This standard deviation of 0.29 was then used for the ANOM comparing the 2015 ERAs of the 30 teams:

Bottom line:
  • Special cause “high” ERAs: Atlanta (#2), Colorado (#9), Oakland (#20)
  • Special cause “low” ERAs: Kansas City (#12), Pittsburgh (#22), St. Louis (#26)
  • Boston (4.24):  above the average (3.71), but not a special cause – no different from the 24 other teams between the limits (or 3.71 for that matter).
Bottom line from both analyses:  

Actually, he came to the right conclusion in terms of the odds of “bridging that divide of 0.53” being excellent  –  but for the wrong reasons.  Based on the 2014 / 2015 data analysis and its calculated common cause of 0.29:
  • There was no divide:  Boston’s 4.23 was statistically indistinguishable from 3.71!  
  • Boston could easily go from 4.23 to as low as 3.13 (difference of 1.1)  just due to common cause, which wouldn’t necessarily indicate improvement.  
  • But he’s right:  the odds are indeed excellent…just due to chance.
He then states "the hallmark of bullpens is their inconsistency."  Once again, true, but for the wrong reasons.  I hope I have shown that variation can be routinely "consistently inconsistent" within a predictable, but humanly unacceptable, range. Rather than accept this, he next went on a “fishing expedition” into bullpen stats to "explain" it further.  More about that next time as well as a surprising conclusion from yet another, totally different ANOM method analyzing the 2015 ERAs.

Until then...

Kind regards,

P.S. A unique feature of Data Sanity is the depth of coverage of the vital Analysis of Means technique in Chapter 7
Feedback like that obtained at the beginning of today's newsletter has me firmly convinced that a  one to two day leadership retreat  with safe dialogue using the content of Chapters 1 to 4 & 9 is key to getting cultures "unstuck" in their quest for excellence.

Do you need a plenary speaker for an internal or professional conference  or some mentoring to help you "quantum leap" to a new level of eye-opening effectiveness?

As always, I welcome contact from my readers with comments or to answer any questions.
( )

Was this forwarded to you?  Would you like to sign up?
If so, please visit my web site -- -- and fill out the box in the left margin on the home page, then click on the link in the confirmation e-mail you will immediately receive.

Want a concise summary of Data my own words?
Listen to my 10-minute podcast. Go to the bottom left of this .