2007 Hitter Projection Roundup
by Nate Silver
Disclaimer: there are as many ways to evaluate projections as there are to create them. This is a SQuiD (Semi Quick-n-Dirty) method that involves looking at some basic descriptive statistics.
I was able to find access to eight projection systems that are either publicly available or I have a subscription to of some kind. These were: PECOTA, Sean Smith’s CHONE, Dan Szymborski’s ZiPS, Tango’s Marcel, and the projections from The Hardball Times (THT), ESPN Fantasy, Rotowire, and RotoTimes, respectively.
Finally, my favorite metric, which is based determining which systems give us the best information. Specifically, what I’m doing here is throwing all the forecasts into a regression analysis and determining which ones contribute the most to the forecast bundle. This is basically a combination of how accurate a forecasting system is and how unique it is.
System Coeff t-score
PECOTA +.508 3.46**
ZiPS +.413 2.16**
ESPN +.285 2.49**
Marcel +.237 1.22
THT -.018 -0.11
CHONE -.033 -0.16
RotoWire -.171 -1.05
RotoTimes -.320 -1.86
(Constant +.067 1.15)
The three systems that give you the most positive information are PECOTA, ZiPS, and (somewhat surprisingly) ESPN in that order. In other words, if you had our projections and some of the other projections, the ideal blend would be 5 parts PECOTA, 4 parts ZiPS, and 3 parts ESPN. You could also add in 2 parts of Marcel without hurting yourself. The other projection systems don’t really tell you anything … they might be perfectly fine systems, but they don’t give you any unique information. (Actually, you could almost do better by adding in a NEGATIVE weight from the RotoTimes projections, but that result is not statistically significant).
So, another good year from PECOTA, certainly a good year from ZiPS — Dan does excellent work. I think we can call those two co-champs, but several of the other systems weren’t far behind. We’ll repeat this exercise for pitchers at some point within the next week or two.
I understand that we want a forecasting system to be "unique" so that we get the best chance of identifying true breakout and collapse players, but shouldn't we measure only on accuracy?