StrategyFebruary 8, 2012


Post to Twitter

Calculating Player Values I - 5 comments

By Jason Swartley

You’ve gone through the motions of assessing players for the upcoming season, and now sits before you a spreadsheet full of projected stats.  So what exactly do you do with this now?  Do you just eyeball it, sort it by which players seem to be the best and let that lead you to rotisserie greatness?  Well, today we’ll be looking at how to turn that list of projections into player rankings, and ultimately, auction values should you so desire.  We’ll be using GiantsFan14’s recently published projections as our example.  Quick note before we begin, as in most things in life there is more than one way to do this.  I’ll be covering some of the basics of one possible way you might choose to compute values.  Hopefully this will be enough to get you pointed in the right direction, and once you get started you will see how you might tweak some of these concepts to suit your individual needs.

First let us consider (if you never have before) the inherent difficulty of turning projected stats into a quantifiable ranking or value.  How do you compare for instance, a player who hits 21 home runs to a more or less equivalent but speedier player who only hits 7 home runs but chips in a dozen steals?  Which is more valuable to your team, and by how much?  All of the numbers mean completely different things depending on what you’re talking about, making them difficult to compare.  Thirty steals is great production, but only scoring 30 runs is a major red flag.  And how do you compare any of that to batting average, which is just a percentage?  I plotted a frequency histogram of GF’s projections for all five hitting stats, and had to cheat at that to get them all to show up on one graph.  I multiplied batting average by 100, you’ll see it tightly grouped in the 24-30 range.

Raw Distribution

As you can see, aside from the similar curves for runs and rbis, these 5 stats really have no relation to each other and have vastly different distributions.  So if only there was a way to compare these stats on a common scale so we could see apples vs apples.  Well, thanks to the power of statistics (and Excel), there is.

Let’s revisit your high school statistics class and discuss a couple of basic concepts.  The first, mean (or average), is pretty intuitive.  Running some quick computations on GF’s stats (thanks, Excel!) we see the following mean values:

BA:  .272
HR: 14.49
R: 60.98
RBI: 57.89
SB: 9.91

Great.  So now we know that scoring 60.98 runs is equivalent in value to hitting 14.49 home runs or stealing 9.91 bases.  But this knowledge is somewhat limited in value, as not many players hit exactly 14.49 home runs.  And we still don’t know how to compare stats that differ from the mean.  That’s where we need the standard deviation.  Which sounds complicated, and if you’ve ever tried computing it by hand you know it can be labor-intensive.  So again, thanks Excel for having this as a built-in function.  Standard deviation is a measure of the spread of a set of data.  For instance, look at the runs plot which has an average around 60, but is very flat and stretches from end to end.  Now compare that to batting average, which has an average around 27, and displays as a tall, narrow curve.   Two very different distributions of those 300+ data points.  Standard deviation is a way we can represent numerically the shape of that curve.  An easy way to think about it is that roughly two thirds of all values will fall within one standard deviation of the mean.  For instance, I quickly computed standard deviations on each scoring statistic in GF’s projections and saw these results:

HR: 9.31
R: 23
RBI:  25.45
SB: 10.44

So this lets us know that about 67% of all players will score 60.98 +/- 23 runs.  And this is tremendously valuable to know.  We now know that 9 homeruns above average is pretty darn good, but 9 rbis above average is nothing special.  Now, we can run what are called Z-scores on our data.  A Z-score lets us look at a data point, such as how many runs a player will score, as it relates to the rest of the data points.  We compare to the mean, but we don’t count the difference in terms of runs (15.02 runs above average), instead we count in terms of the standard deviation (.65 standard deviations above average).  Why?  Well, every scoring category has a mean and a standard deviation.  Plotting Z-scores for them all lets us speak of them in the same language.  You could Google this, but the formula for the Z-score of a data point is (x-mean(x))/stdev(x).  Now look at this graph of Z-scores for each scoring stat.

normalized

The average of every curve is now zero, and the relative distribution is roughly the same.  Except for that one tall leftward-skewed stolen base curve.  For now let’s leave that as an exercise to the reader.  From here, it’s a piece of cake.  You add up the Z-scores for every player, that gives you their total value across all five categories.  Note that you could also assign different weights to the different categories if you felt one was more valuable/predictable/etc.  Now you’re on your way to rotisserie domination.

But I know what you’re saying.  “Hold it there, Rocko!  I didn’t see you compute a standard deviation for batting average!”  Well, good catch.  Let’s talk about that.  You see, you can’t just take a guy’s batting average.  The average across GF’s player universe is .272, so someone who hits .280 will help your team, right?  Yes, but how much he helps depends on how many at-bats he gets.  A platoon player just won’t have the impact, good or bad, that a full time leadoff hitter would.  And it’s unfortunately not as simple as just multiplying batting average by at-bats, that just gets us the number of hits.  Which again, without at-bats, doesn’t tell us enough.

I like to think of it this way.  Imagine you’ve got a roster full of .272 hitting average players.  Now here’s where we need to know more about your league if we were to customize this for you.  Let’s assume for now this is a standard 12 team league with a common roster of nine hitters and seven pitchers.  Let’s suppose you had eight average hitters.  The league average at-bats is 447, meaning your eight average hitters would have amassed 3576 of them.  In that time they would have collected 973 hits (3576*.272).  Now, you add your last hitter.  Let’s suppose it’s Miguel Cabrera, whom GiantsFan14 has projected to get 190 hits in 578 at-bats.  Your team totals are now 1163 hits in 4154 at-bats, for a team average of .280.  So Cabrera alone has raised your team average by .008!  I like to subtract back out the .272 so I’m left with just the net effect on team BA.  Run a Z-score on this new column in your spreadsheet and you have your final piece.

That’s a good stopping point for now.  In a future article we’ll discuss position scarcity, and look at how to use what we just did to compute custom auction values.

 
Jason posts regularly here at the cafe under the user name TheRock. He is a fan of many fantasy sports, particularly baseball. He routinely DOMINATES his Yahoo! public league.
 
Rate this article: DreadfulNot goodFairGoodVery good (8 votes, average: 4.50 out of 5)
Loading ... Loading ...

Want to write for the Cafe? Check out the Cafe's Pencil & Paper section!

Post to Twitter

Related Cafe Articles

• Other articles by Jason Swartley

No related articles.



5 Responses to “Calculating Player Values I”

  1. Merlin401 says:

    Regarding the batting average problem, what I do is just find the average at bats for all the players you are projecting and multiply the ratios (Player AB / Avg AB) times their BA Z-score which individually weights everyone’s BA z-score contribution. With Excel this seems like the easiest way to go about it. Do the same with ERA, WHIP, etc

    ReplyReply
  2. User avatar Broncmet724 says:

    I subtract the mean BA from the player’s BA and multiply times the player’s ABs ((Player’s BA – mean BA) * Players AB) since players with more ABs will help, or hurt, your team’s BA more. Then take the mean of this column (should be zero or really close) and the standard deviation of this column. Then calculate your Z-score of this.

    ReplyReply
  3. User avatar Tavish says:

    Any chance you could provide some of your player pool mean and std_dev totals in the next article? Not sure if you are planning on breaking them down by position (seems like that is what you were previewing) or just using a general calculation. Either would be nice (although by position would be even nicer!).

    ReplyReply
  4. User avatar Tavish says:

    Bah! Nevermind, they are right there in this article. :)

    ReplyReply
  5. gparker1515 says:

    @Merlin401: That’s what I was just thinking about doing. That seems a lot simpler, but does it work?

    ReplyReply

Leave a Reply

You must be logged in to post a comment.