rookies and cream wrote:Yeah, but does it matter that were not talking about multiple samples? I was speaking in terms of one sample (all players in MLB in one year), not teams in a fantasy league. We also discussed whether SB's would be normally distributed over the history of baseball. However, even if you consider each year as an individual sample, they would not be independent of one another due to player overlap. Does the CLT even apply to what we are discussing?

Two things. First, there's no problem using z-scores with non-normal distributions. The Z-score is a widely applicable approach to standardizing any distribution. The difference with a nonnormal distribution is that you cannot interpret the z-score using the standard normal tables.

Second, in essence, we are talking about multiple samples. I'm competing against a bunch of guys and we are each drawing a sample of players. We get measured in terms of total production in each category (or our average production, in the case of rate stats). Those totals or averages are going to be normally distributed (if anyone has played in a keeper league for several years, take a look at the results for several years and see if it's normal). So, each player can be assessed in terms of their standard score.

Z-scores are good (note in the discussion the importance of using actual data, rather than projections, to get the s.d. and the importance of replacement value) and the SGP method is good, imo. I tend to use the second, but z-scores are essentially the same thing in unitless measures.

Ok, it's been a long time since I reviewed CLT and sampling, so I'll defer to you on those topics. However, as a neuropsychologist (yes, background in stats but less than GTWMA), I interpret standard scores from tests everyday. Although I guess I could be wrong, I do not think you can treat z-scores the same from standard and non-standard distributions, which is essentially what you are doing in the method described. While you are not making interpretations using standard normal tables (e.g., z of 1 = 84th percentile), you are making inferences in regards to player X's standing in a particular category. I do not see how you can make the same assumptions when data in certain categories are not spread evenly.

rookies and cream wrote:Ok, it's been a long time since I reviewed CLT and sampling, so I'll defer to you on those topics. However, as a neuropsychologist (yes, background in stats but less than GTWMA), I interpret standard scores from tests everyday. Although I guess I could be wrong, I do not think you can treat z-scores the same from standard and non-standard distributions, which is essentially what you are doing in the method described. While you are not making interpretations using standard normal tables (e.g., z of 1 = 84th percentile), you are making inferences in regards to player X's standing in a particular category. I do not see how you can make the same assumptions when data in certain categories are not spread evenly.

I think we are saying the same thing, really. A z-score is just a method of standardizing data based on mean and standard deviation. Whether or not you can treat them the same depends on what you intend to do with them. You cannot use z-scores from nonnormal distributions to draw inferences based upon the normal table. But, here we are not drawing inferences across, but within the categories. The assumption we are making is simply that within each category, a player's relative contribution is measured by how many sd above or below the mean they are. And, there's no problem with that. Whether the data are spread evenly or not, we'll be dividing through by the s.d., and that adjusts the valuation within that category for the spread in the data for that category.

I'm trying to think of reasons why the skewness or kurtosis would bias the approach as you add up across categories, but don't see that--but you could be right.

"I don't want to play golf. When I hit a ball, I want someone else to chase it."

I have been going much more in depth with my projections for this year's upcoming draft season (well, on going too, with my slow Cafe draft going on). One thing I have been checking out is tied into the category goals to finish top 3 in each of the 5x5 stats. It's pretty easy math, but I pretty much just average out each category for the players I have selected. I am trying to figure out where my average for each stat should be sitting at for every round that has gone by. Does anyone else do things this way? If so, does this make any sense to do? Is there a way you do this that is better? If not, are there any betteras well as easy calculations that you could explain to me that does not involve a t4000 calculator and an accountant to rely on?

Sounds like you do similar to what I do. I track by percentages. For example, if you have ten position players, than to make 100% of your target, you have to get roughly 10 percent from each player. By the 3rd of 4th pick, I can see if I am falling more than a few percent behind in any category. If I've made 5 of my player picks (50% of my position players) and I'm only at 40% of my steals target, I see I'm falling short. Generally, I try not to get more than 5 percent short in a category before reacting.

"I don't want to play golf. When I hit a ball, I want someone else to chase it."