When I first started looking carefully at baseball statistics, I learned the basics that the statisticians use when examining the game. The main part that stood out to me was the idea that hits put into play off of a pitcher are largely dependent on luck. Pitchers have a very small effect on the number of balls in play that fall for hits. The league average works out to right around .300 on the balls put into play. That seemed like a very significant fact to me, and spurred me on to do more research.
My main thought here (looking at this situation from a fantasy baseball manager’s point of view) was to use this information to find pitchers whose bad luck over the last few years had deflated their value to the point where they were bargains. To do this I would need one formula to project ERA, and another formula to project WHIP. I could then take a pitcher’s 3 year averages in the component numbers (BB, K, HR and IP) and find the guys that showed a large variance between actual, and projected ERA/WHIP both, to identify the over and undervalued pitchers.
I found a great ERA projection tool by Voros McCracken called Defense Independent Component ERA or DICE for short. This was exactly the type of number which I had scoured the internet to find. I took his formula and ran it against the 2003-2005 seasons (thanks to Doug’s Stats for making those available). DICE did an outstanding job, not just of accurately projecting ERA, but also in finding pitchers that either benefited or suffered at the hands of lady luck.
Unfortunately I could not find a good formula to project WHIP anywhere on the internet. So it was time to put on the thinking cap and make one up myself. I’ll start by simply introducing the formula and then explain the logic behind it:
Projected Hits = 0.4 * (3 * IP – K) + HR meaning that Projected Whip = (Projected Hits + BB) / IP
The reasoning behind this formula is that 30% of balls in play fall for hits on average. The 3 * IP – K part gives us the number of outs in play, so 3/7 or so gives us the hits in play from the outs in play (since .300 batted balls in play fall for a hit and .700 batted balls in play are recorded outs it’s a 3/7 ratio). That’s about 42% or so, but when I fit the data to the years 2003-2005 I found that it actually worked out best with a slightly lower constant that’s very close to 0.4. It’s amazing how good of a predictor this is for most pitchers, and how good an idea it gives you of pitchers that have had very good, or very bad luck in a given year. I would assume that even though most sabermetrics people say pitchers do not control the hits in play, that pitchers do have a slight ability to influence this number. When I did my analysis of the numbers, it seemed that VERY bad pitchers were overvalued by the formula, leading me to believe that while the league average might be a .300 average on balls in play, the average for a reasonably good pitcher (a pitcher with value in fantasy leagues) might be closer to .286 or so (which would explain the .4 hits in play to outs in play ratio). Further study would be necessary here to confirm this, but for the time being I’m fairly comfortable with the 0.4 coefficient, and since the logic behind the formula is very simple, changes to the coefficient don’t invalidate the value of the formula – they merely help us focus in more detail on the exact number of hits we should expect from the pitcher in question.
Since the original release of the formula to the general public, I’ve refined my thoughts on the coefficient as well. I do not believe that the .300 BABIP expectation is completely accurate for typical fantasy pitchers (i.e. pitchers that aren’t borderline major leaguers). As such, I think the formula is completely accurate and that our expectation on BABIP is a tiny bit off from what really happens with these pitchers. To make the formula more exact I would list it as follows:
Projected Hits = (BABIP / 1 – BABIP) * (3 * IP – K) + HR
By expressing the formula in this way, we take things out of the realm of guess-work and put it into an exact science. Assuming a correct projection of BABIP, IP, K, and HR, this formula is not conjecture – it is a fact. Also, we allow ourselves to treat BABIP as a variable which could be very useful in examining the effect that a change in park or defense might have on a pitcher. If hard numbers support that pitching in Coors Field with poor outfielders from a range standpoint, could lead to a BABIP of .310 instead of our .286 assumption; we can easily adjust our projection accordingly. I would caution against varying too much from the .286 assumption that leads to the 0.4 coefficient though, as the numbers were VERY accurate over the course of the 3 seasons I analyzed.
Bob Hoyng is one of a growing number of fantasy experts who write for the Cafe. You can catch up with Bob and all the other authors in the Cafe's forums.
Questions or comments for Bob? Post them in the Cafe Forums!
Want to write for the Cafe? Check out the Cafe's Pencil & Paper section!